Stephen Wolfram: Computational Universe | MIT 6.S099: Artificial General Intelligence (AGI)

00:00:00.000 | Welcome back to 6S099 Artificial General Intelligence.

00:00:04.960 | Today we have Stephen Wolfram.

00:00:06.820 | (audience applauding)

00:00:09.980 | Wow.

00:00:12.560 | That's a first, I didn't even get started,

00:00:15.520 | you're already clapping.

00:00:16.720 | In his book, A New Kind of Science,

00:00:19.960 | he has explored and revealed the power,

00:00:21.840 | beauty and complexity of cellular automata

00:00:24.000 | as simple computational systems

00:00:27.160 | for which incredible complexity can emerge.

00:00:30.160 | It's actually one of the books that really inspired me

00:00:32.360 | to get into artificial intelligence.

00:00:34.240 | He's created the Wolfram Alpha

00:00:35.560 | computational knowledge engine,

00:00:37.200 | created Mathematica that has now expanded

00:00:39.380 | to become Wolfram Language.

00:00:41.120 | Both he and his son were involved in helping analyze,

00:00:44.240 | create the alien language from the movie Arrival

00:00:47.920 | of which they use the Wolfram Language.

00:00:50.040 | Please again, give Stephen a warm welcome.

00:00:53.240 | (audience applauding)

00:00:56.560 | All right, so I gather the brief here

00:00:58.680 | is to talk about how artificial general intelligence

00:01:01.240 | is going to be achieved.

00:01:02.240 | Is that the basic picture?

00:01:04.840 | So maybe I'm reminded of kind of a story

00:01:07.560 | which I don't think I've ever told in public

00:01:09.280 | but that something that happened

00:01:11.640 | just a few buildings over from here.

00:01:13.040 | So this was 2009 and Wolfram Alpha

00:01:16.320 | was about to arrive on the scene.

00:01:19.060 | I assume most of you have used Wolfram Alpha

00:01:21.120 | or seen Wolfram Alpha, yes?

00:01:23.360 | How many of you have used Wolfram Alpha?

00:01:27.560 | Okay, that's good.

00:01:28.640 | (audience laughing)

00:01:31.120 | So I had long been a friend of Marvin Minsky's

00:01:34.320 | and Marvin was a sort of pioneer of the AI world

00:01:37.880 | and I'd kind of seen for years

00:01:41.680 | question answering systems that tried to

00:01:44.080 | do sort of general intelligence question answering

00:01:48.560 | and so had Marvin.

00:01:49.960 | And so I was gonna show Marvin Wolfram Alpha.

00:01:54.440 | He looks at it and he's like, oh, okay, that's fine.

00:01:57.620 | Whatever, said no Marvin, this time it actually works.

00:02:01.860 | You can try real questions.

00:02:03.920 | This is actually something useful.

00:02:05.380 | This is not just a toy.

00:02:07.060 | And it was kind of interesting to see.

00:02:08.800 | It took about five minutes for Marvin to realize

00:02:11.260 | that this was finally a question answering system

00:02:13.860 | that could actually answer questions

00:02:15.300 | that were useful to people.

00:02:16.620 | And so one question is how did we achieve that?

00:02:21.120 | So you go to Wolfram Alpha and you can ask it,

00:02:23.500 | I mean, it's, I don't know what we can ask it.

00:02:25.580 | I don't know, what's the, some random question.

00:02:30.580 | What is the population of Cambridge?

00:02:33.060 | Actually, here's a question, divided by, let's try that.

00:02:36.100 | What's the population of Cambridge?

00:02:37.460 | It's probably gonna figure out.

00:02:39.020 | If we mean Cambridge, Massachusetts,

00:02:40.540 | it's gonna give us some number,

00:02:41.580 | it's gonna give us some plot.

00:02:43.140 | Actually, what I wanna know is number of students

00:02:47.500 | at MIT divided by population of Cambridge.

00:02:51.180 | See if it can figure that out.

00:02:54.740 | And, okay, it's kind of interesting.

00:02:58.260 | Oh, no, that's divided by, ah, that's interesting.

00:03:01.180 | It guessed that we were talking about Cambridge University

00:03:04.220 | as the denominator there.

00:03:05.740 | So it says the number of students at MIT

00:03:08.120 | divided by the number of students at Cambridge University.

00:03:10.300 | That's interesting, I'm actually surprised.

00:03:11.820 | Let's see what happens if I say Cambridge MA there.

00:03:14.620 | No, it'll probably fail horribly.

00:03:17.660 | No, that's good.

00:03:19.140 | Okay, so, no, that's interesting.

00:03:22.660 | That's a plot as a function of time,

00:03:24.940 | of the fraction of the, okay, so anyway.

00:03:28.980 | So I'm glad it works.

00:03:32.020 | So one question is how did we manage to get,

00:03:38.080 | so many things have to work

00:03:39.660 | in order to get stuff like this to work.

00:03:41.820 | You have to be able to understand the natural language,

00:03:44.780 | you have to have data sources,

00:03:46.340 | you have to be able to compute things

00:03:47.660 | from the data, and so on.

00:03:49.100 | One of the things that was a surprise to me

00:03:51.760 | was in terms of natural language understanding,

00:03:54.280 | was the critical thing turned out to be

00:03:57.820 | just knowing a lot of stuff.

00:03:59.780 | The actual parsing of the natural language

00:04:02.660 | is kind of, I think it's kind of clever,

00:04:04.660 | and we use a bunch of ideas that came

00:04:06.220 | from my new kind of science project and so on.

00:04:08.860 | But I think the most important thing

00:04:10.500 | is just knowing a lot of stuff about the world

00:04:13.060 | is really important to actually being able

00:04:16.100 | to understand natural language in a useful situation.

00:04:20.140 | I think the other thing is having,

00:04:22.340 | actually having access to lots of data.

00:04:26.080 | Let me show you a typical example here

00:04:27.680 | of what is needed.

00:04:29.120 | So I ask about the ISS, and hopefully it'll wake up

00:04:32.880 | and tell us something here, come on,

00:04:34.240 | what's going on here?

00:04:35.560 | There we go, okay.

00:04:36.680 | So it figured out that we probably are talking

00:04:38.720 | about a spacecraft, not a file format,

00:04:41.000 | and now it's gonna give us a plot

00:04:42.280 | that shows us where the ISS is right now.

00:04:45.680 | So to make this work, we obviously have to have

00:04:48.280 | some feed of radar tracking data about satellites

00:04:52.680 | and so on, which we have for every satellite

00:04:54.440 | that's out there.

00:04:56.440 | But then that's not good enough to just have that feed.

00:04:59.560 | Then you also have to be able to do celestial mechanics

00:05:02.520 | to work out, well, where is the ISS actually right now

00:05:05.800 | based on the orbital elements that have been deduced

00:05:07.960 | from radar, and then if we want to know things like,

00:05:10.600 | okay, when is it going to, it's not currently visible

00:05:13.340 | from Boston, Massachusetts, it will next rise

00:05:15.920 | at 7.36 p.m. on Monday on today.

00:05:20.920 | So this requires a mixture of data

00:05:26.680 | about what's going on in the world,

00:05:28.400 | together with models about how the world

00:05:31.080 | is supposed to work, being able to predict things,

00:05:33.120 | and so on.

00:05:33.960 | And I think another thing that I kind of realized

00:05:35.680 | about AI and so on from the Wolfram Alpha effort

00:05:42.300 | has been that one of the earlier ideas

00:05:46.840 | for how one would achieve AI was let's make it work

00:05:50.560 | kind of like brains do, and let's make it figure stuff out,

00:05:52.760 | and so if it has to do physics, let's have it do physics

00:05:55.620 | by pure reasoning, like people at least used to do physics.

00:06:00.620 | But in the last 300 years, we've had a different way

00:06:03.960 | to do physics that wasn't sort of based

00:06:05.820 | on natural philosophy.

00:06:07.140 | It was instead based on things like mathematics.

00:06:09.900 | And so one of the things that we were doing

00:06:12.340 | in Wolfram Alpha was to kind of cheat relative

00:06:15.460 | to what had been done in previous AI systems,

00:06:18.340 | which was instead of using kind of reasoning-type methods,

00:06:20.980 | we're just saying, okay, we want to compute

00:06:22.620 | where the ISS is going to be,

00:06:24.300 | we've got a bunch of equations of motion

00:06:25.960 | that correspond to differential equations,

00:06:27.700 | we're just gonna solve the equations of motion

00:06:29.540 | and get an answer.

00:06:30.580 | That's kind of leveraging the last 300 years or so

00:06:33.300 | of exact science that had been done,

00:06:36.100 | rather than trying to make use

00:06:37.500 | of kind of human reasoning ideas.

00:06:39.900 | And I might say that in terms of the history

00:06:43.340 | of the Wolfram Alpha project,

00:06:45.860 | when I was a kid, a disgustingly long time ago,

00:06:49.320 | I was interested in AI kinds of things,

00:06:51.820 | and I, in fact, I was kind of upset recently

00:06:54.780 | to find a bunch of stuff I did when I was 12 years old,

00:06:56.940 | kind of trying to assemble a pre-version of Wolfram Alpha

00:07:00.380 | way back before it was technologically possible,

00:07:02.540 | but it's also a reminder that one just does the same thing

00:07:05.580 | one's whole life, so to speak, at some level.

00:07:09.180 | But what happened was when I started off working mainly

00:07:14.180 | in physics, and then I got involved

00:07:18.440 | in building computer systems to do things

00:07:20.420 | like mathematical computation and so on,

00:07:22.820 | and I then sort of got interested in, okay,

00:07:26.180 | so can we generalize this stuff,

00:07:27.700 | and can we really make systems that can answer

00:07:31.140 | sort of arbitrary questions about the world,

00:07:33.380 | and for example, sort of the promise would be

00:07:38.020 | if there's something that is systematically known

00:07:41.100 | in our civilization, make it automatic to answer questions

00:07:44.020 | on the basis of that systematic knowledge.

00:07:46.240 | And back in around late 1970s, early 1980s,

00:07:50.020 | my conclusion was if you wanted to do something like that,

00:07:52.460 | the only realistic path to being able to do it

00:07:55.060 | was to build something much like a brain.

00:07:57.240 | And so I got interested in neural nets,

00:07:59.020 | and I tried to do things with neural nets back in 1980,

00:08:01.860 | and nothing very interesting happened,

00:08:03.660 | well, I couldn't get them to do anything very interesting.

00:08:06.260 | And that, so I kind of had the idea

00:08:09.500 | that the only way to get the kind of thing

00:08:11.740 | that now exists in Wolfram Alpha, for example,

00:08:14.260 | was to build a brain-like thing.

00:08:16.960 | And then many years later, for reasons I can explain,

00:08:20.300 | I kind of came back to this and realized,

00:08:22.780 | actually, it wasn't true that you had to build

00:08:24.380 | a brain-like thing, sort of mere computation was sufficient.

00:08:27.820 | And that was kind of what got me started

00:08:29.780 | actually trying to build Wolfram Alpha.

00:08:31.660 | When we started building Wolfram Alpha,

00:08:33.460 | one of the things I did was go to a sort of a field trip

00:08:35.700 | to a big reference library, and you see all these shelves

00:08:39.380 | of books and so on, and the question is,

00:08:41.300 | can we take all of this knowledge that exists

00:08:43.260 | in all of these books and actually automate

00:08:46.180 | being able to answer questions on the basis of it?

00:08:48.400 | And I think we've pretty much done that.

00:08:49.980 | For that, at least the books you find

00:08:51.900 | in a typical reference library.

00:08:54.060 | So that was, it looked kind of daunting at the beginning

00:08:57.260 | because there's a lot of knowledge and information

00:08:59.980 | out there, but actually it turns out

00:09:01.860 | there are a few thousand domains,

00:09:03.620 | and we've steadily gone through and worked

00:09:05.860 | on these different domains.

00:09:07.300 | Another feature of the Wolfram Alpha project

00:09:09.120 | was that we didn't really, you know,

00:09:11.140 | I'd been involved a lot in doing basic science

00:09:13.360 | and in trying to have sort of grand theories of the world.

00:09:15.980 | One of my principles in building Wolfram Alpha

00:09:17.920 | was not to start from a grand theory of the world.

00:09:20.100 | That is, not to kind of start from some global ontology

00:09:22.980 | of the world and then try and build down

00:09:25.020 | into all these different domains,

00:09:26.220 | but instead to work up from having, you know,

00:09:28.980 | hundreds, then thousands of domains that actually work,

00:09:31.580 | whether they're, you know, information about cars

00:09:34.220 | or information about sports or information about movies

00:09:37.260 | or whatever else, have each of these domains

00:09:40.440 | sort of building up from the bottom

00:09:41.980 | in each of these domains, and then finding

00:09:44.220 | that there were common themes in these domains

00:09:46.020 | that we could then build into frameworks

00:09:48.020 | and then sort of construct the whole system

00:09:50.000 | on the basis of that, and that's kind of how it's worked,

00:09:52.740 | and I can talk about some of the actual frameworks

00:09:54.620 | that we end up using and so on.

00:09:57.040 | But maybe I should explain a little bit more.

00:10:01.440 | So one question is how does Wolfram Alpha

00:10:04.760 | actually sort of work inside?

00:10:06.960 | And the answer is it's a big program.

00:10:09.320 | It's about, it's the core system is about 15 million lines

00:10:12.880 | of Wolfram language code, and it's some number

00:10:15.840 | of terabytes of raw data.

00:10:17.960 | And so the way, the thing that sort of made

00:10:24.000 | building Wolfram Alpha possible was this language,

00:10:26.940 | Wolfram language, which started with Mathematica,

00:10:30.520 | which came out in 1988, and has been sort of

00:10:33.400 | progressively growing since then.

00:10:35.360 | So maybe I should show you some things

00:10:36.880 | about Wolfram language, and it's easy.

00:10:39.800 | You can go use this.

00:10:41.280 | MIT has a site license for it.

00:10:42.920 | You can use it all over the place.

00:10:45.320 | You can find it on the web, et cetera, et cetera, et cetera.

00:10:48.800 | But, okay, the basics work.

00:10:54.440 | Let's start off with something like,

00:10:57.180 | let's make a random graph, and let's say we have

00:11:01.420 | a random graph with 200 nodes, 400 vertices.

00:11:04.220 | Okay, so there's a random graph.

00:11:05.900 | The first important thing about Wolfram language

00:11:07.780 | is that it's a symbolic language.

00:11:09.260 | So I can just pick up this graph, and I could say,

00:11:11.980 | you know, I don't want to do some analysis of this graph.

00:11:14.500 | That graph is just a symbolic thing

00:11:16.500 | that I can just do computations on.

00:11:19.060 | Or I could say, let's get a, another good thing

00:11:23.020 | to always do is get a current image.

00:11:25.740 | See, there we go.

00:11:27.060 | And now I could go and say something like,

00:11:30.560 | let's do some basic thing.

00:11:33.540 | Let's say, let's edge detect that image.

00:11:35.900 | Again, this image is just a thing that we can manipulate.

00:11:40.300 | We could take the image, we could make it,

00:11:42.400 | I don't know, we could take the image

00:11:45.800 | and partition it into little pieces,

00:11:47.860 | do computations on that.

00:11:49.740 | I don't know, simple.

00:11:50.860 | Let's do, let's just say, sort each row of the image,

00:11:55.100 | assemble the image again, oops.

00:11:57.100 | Assemble that image again,

00:12:02.180 | we'll get some mixed up picture there.

00:12:04.620 | If I wanted to, I could, for example,

00:12:06.140 | let's say, let's make that the current image,

00:12:08.020 | and let's say, make that dynamic.

00:12:10.940 | I can be just running that code, hopefully,

00:12:13.700 | and a little loop, and there we can make that work.

00:12:17.780 | So, one general point here is,

00:12:22.780 | this is just an image for us,

00:12:25.900 | it's just a piece of data like anything else.

00:12:28.020 | If we just have a variable, a thing called x,

00:12:30.300 | it just says, okay, that's x,

00:12:31.880 | I don't need to know a particular value,

00:12:33.580 | it's just a symbolic thing that corresponds to,

00:12:37.440 | that's a thing called x.

00:12:39.220 | Now, what gets interesting when you have

00:12:43.420 | symbolic language and so on is,

00:12:45.240 | we're interested in having it represent

00:12:46.760 | stuff about the world, as well as

00:12:48.980 | just abstract kinds of things.

00:12:50.180 | I mean, I can abstractly say,

00:12:52.780 | find some funky integral, I don't know what,

00:12:57.340 | that's then representing, using symbolic variables

00:13:01.680 | to represent algebraic kinds of things.

00:13:03.480 | But I could also just say, I don't know,

00:13:05.740 | something like Boston.

00:13:07.660 | And Boston is another kind of symbolic thing

00:13:10.500 | that has, if I say, what is it really inside?

00:13:14.020 | That's the entity, a city, Boston,

00:13:17.580 | Massachusetts, United States.

00:13:19.540 | Actually, you notice when I typed that in,

00:13:21.040 | I was using natural language to type it in,

00:13:23.800 | and it gave me a bunch of disambiguation here.

00:13:26.200 | It said, assuming Boston is a city,

00:13:29.180 | assuming Boston, Massachusetts,

00:13:30.600 | use Boston, New York, or, okay,

00:13:32.340 | let's use Boston and the Philippines,

00:13:34.960 | which I've never heard of, but let's try using that instead.

00:13:38.920 | And now, if I look at that, it'll say

00:13:41.160 | it's Boston and some province of the Philippines,

00:13:43.720 | et cetera, et cetera, et cetera.

00:13:45.000 | Now, I might ask it, of that, I could say something like,

00:13:48.880 | what's the population of that?

00:13:52.800 | And it, okay, it's a fairly small place.

00:13:57.640 | Or I could say, for example, let me do this.

00:13:59.760 | Let me say, a geolist plot from that Boston,

00:14:04.480 | let's take from that Boston, two,

00:14:09.000 | and now let's type in Boston again,

00:14:10.720 | and now let's have it use the default meaning

00:14:12.480 | of the word of Boston, and then let's join those up,

00:14:16.160 | and now this should plot, this should show me a plot.

00:14:20.600 | There we go, okay, so there's the path from the Boston

00:14:24.760 | that we picked in the Philippines to the Boston here.

00:14:27.240 | Or we could ask it, I don't know, I could just say,

00:14:29.800 | I could ask it the distance from one to another

00:14:31.640 | or something like that.

00:14:33.160 | So, one of the things here,

00:14:35.720 | one of the things we found really, really useful, actually,

00:14:39.040 | in Wolfram Language, so first of all,

00:14:40.720 | there's a way of representing stuff about the world,

00:14:43.080 | like cities, for example.

00:14:45.080 | Or let's say I want to say, let's do this.

00:14:46.960 | Let's say, let's do something with cities.

00:14:51.120 | Let's say capital cities in South America.

00:14:53.920 | Okay, so notice, this is a piece of natural language.

00:14:56.840 | This will get interpreted into something

00:14:59.720 | which is precise, symbolic Wolfram Language code

00:15:02.640 | that we can then compute with,

00:15:05.320 | and that will give us the cities in South,

00:15:06.960 | capital cities in South America.

00:15:09.000 | I could, for example, let's say I say, find shortest tour.

00:15:12.000 | So now I'm going to use some, oops,

00:15:16.040 | no, I don't want to do that.

00:15:17.120 | What I want to do first is to say,

00:15:18.920 | show me the geopositions of all those cities

00:15:22.880 | on line 21 there.

00:15:24.440 | So now it will find the geopositions,

00:15:26.360 | and now it will say, compute the shortest tour.

00:15:29.200 | So that's saying there's a 10,000 mile

00:15:32.200 | traveling salesman tour around those cities,

00:15:34.920 | so I could take those cities that are on line 21,

00:15:37.320 | and I could say, order the cities according to this,

00:15:41.240 | and then I could make another geolist plot of that,

00:15:45.320 | join it up, and this should now show us

00:15:49.280 | a traveling salesman tour of the capital cities

00:15:52.840 | in South America.

00:15:53.820 | So, it's sort of interesting to see what's involved

00:15:58.520 | in making stuff like this work.

00:16:03.000 | One of, my goal has been to sort of automate

00:16:07.000 | as much as possible about things that have to be computed,

00:16:10.920 | and that means knowing as many algorithms as possible,

00:16:14.400 | and also knowing as much data about the world as possible.

00:16:18.200 | And I kind of view this as sort of a knowledge-based

00:16:21.040 | programming approach, where you have a typical kind of idea

00:16:25.840 | in programming languages is, you have some small

00:16:28.400 | programming languages, has a few primitives

00:16:30.480 | that are pretty much tied into what a machine

00:16:32.920 | can intrinsically do, and then maybe you'll have libraries

00:16:36.800 | that add on to that and so on.

00:16:38.500 | My kind of crazy idea of many, many years ago

00:16:42.720 | has been to build an integrated system

00:16:45.440 | where all of the stuff about different domains

00:16:48.200 | of knowledge and so on are all just built into the system

00:16:51.760 | and designed in a coherent way.

00:16:54.500 | I mean, this has been kind of the story of my life

00:16:56.400 | for the last 30 years, is trying to keep the design

00:16:58.560 | of the system coherent, even as one adds

00:17:02.600 | all sorts of different areas of capability.

00:17:07.000 | So, as, I mean, we can go and dive into all sorts

00:17:11.240 | of different kinds of things here, but maybe as an example,

00:17:15.280 | well, let's do, what could we do here?

00:17:17.120 | We could take, let's try, how about this?

00:17:20.680 | Is that a bone?

00:17:21.760 | I think so, that's a bone.

00:17:23.480 | So let's try that.

00:17:27.160 | (keyboard clicking)

00:17:29.160 | As a mesh region.

00:17:30.440 | See if that works.

00:17:32.520 | So this will now use a completely different domain

00:17:34.880 | of human endeavor.

00:17:36.880 | Okay, oops, there's two of those bones.

00:17:38.720 | Let's try, let's just try, let's try left humerus,

00:17:43.720 | and let's try that, the mesh region for that,

00:17:48.520 | and now we should have a bone here.

00:17:50.480 | Okay, there's a representation of a bone.

00:17:52.960 | Let's take that bone, and we could, for example,

00:17:55.040 | say, let's take the surface area of that,

00:17:59.320 | as in some units, or I could, let's do some much more

00:18:02.560 | outrageous thing.

00:18:03.400 | Let's say we take region distance.

00:18:06.760 | So we're going to take the distance from that bone

00:18:11.760 | to a point, let's say, zero, zero, Z,

00:18:16.760 | and let's make a plot of that distance with Z going from,

00:18:21.760 | let's say, I have no idea where this bone is,

00:18:24.960 | but let's try something like this.

00:18:26.600 | So that was really boring.

00:18:28.120 | Let's try, so what this is doing, again,

00:18:34.320 | a whole bunch of stuff has to work

00:18:35.680 | in order for this to operate.

00:18:37.560 | This has to be, this is some region in 3D space

00:18:40.840 | that's represented by some mesh.

00:18:42.680 | You have to compute, you know,

00:18:44.040 | do the computational geometry to figure out where it is.

00:18:46.320 | If I wanted to, let's try anatomy plot 3D,

00:18:51.600 | and let's say something like left hand, for example,

00:18:55.440 | and now it's going to show us probably the complete data

00:18:58.120 | that it has about the geometry of a left hand.

00:19:03.120 | There we go.

00:19:07.200 | Okay, so there's the result, and we could take that apart

00:19:09.800 | and start computing things from it and so on.

00:19:12.080 | So what, so this is,

00:19:18.960 | so there's a lot of kind of computational knowledge

00:19:23.960 | that's built in here.

00:19:25.840 | One, let's talk a little bit about

00:19:29.640 | kind of the modern machine learning story.

00:19:31.560 | So for instance, if I say, let's get a picture here.

00:19:34.520 | Let's say, let's just say picture of,

00:19:38.560 | has anybody got a favorite kind of animal?

00:19:40.660 | What?

00:19:44.320 | Panda.

00:19:45.160 | Okay, so let's try, okay, giant panda.

00:19:50.160 | Okay, okay, there's a panda.

00:19:55.120 | Let's see what, now let's try saying,

00:19:58.160 | let's try for this panda, let's try saying image identify,

00:20:03.840 | and now here we'll be embarrassed probably,

00:20:05.640 | but let's just see, let's see what happens.

00:20:07.520 | If I say image identify that,

00:20:09.840 | and now it'll hopefully, wake up, wake up, wake up.

00:20:14.160 | This only takes a few hundred milliseconds.

00:20:15.800 | Okay, very good, giant panda.

00:20:17.560 | Let's see what the runners up were to the giant panda.

00:20:21.040 | Let's say we want to say the 10 runners up

00:20:29.240 | in all categories for that thing, okay.

00:20:32.280 | So a giant panda, a procyonid, which I've never heard of.

00:20:37.280 | Are pandas carnivorous?

00:20:39.220 | They eat bamboo shoots, okay.

00:20:42.520 | So that was so lucky it didn't get that one.

00:20:45.240 | It's really sure it's a mammal,

00:20:46.800 | and it's absolutely certain it's a vertebrate.

00:20:49.160 | Okay, so you might ask, how did it figure this out?

00:20:53.340 | And so then you can kind of look under the hood and say,

00:20:57.680 | so we have a whole framework

00:20:59.280 | for representing neural nets symbolically.

00:21:01.720 | And so this is the actual model that it's using to do this.

00:21:05.520 | So this is a, so there's a neural net,

00:21:08.120 | and it's got, we can drill down,

00:21:10.040 | and we can see there's a piece of the neural net.

00:21:12.920 | We can drill down even further to one of these,

00:21:14.760 | and we can probably see what,

00:21:15.820 | that's a batch normalization layer,

00:21:17.600 | somewhere deep, deep inside the entrails of the,

00:21:20.840 | not panda, but of this thing, okay.

00:21:24.240 | So now let's take that object,

00:21:25.760 | which is just a symbolic object,

00:21:27.240 | and let's feed it the picture of the panda.

00:21:29.800 | And we can see, and there, oops.

00:21:33.720 | I was not giving it the right thing.

00:21:36.200 | What did I just do wrong here?

00:21:37.640 | Oh, here, let's take, oh, I see what I did.

00:21:40.000 | Okay, let's take this thing

00:21:41.680 | and feed it the picture of the panda,

00:21:43.240 | and it says it's a giant panda, okay.

00:21:45.320 | How about we do something more outrageous?

00:21:47.120 | Let's take that neural net,

00:21:48.800 | and let's only use the first, let's say,

00:21:51.080 | 10 layers of the neural net.

00:21:53.000 | So let's just take out 10 layers of the neural net

00:21:55.360 | and feed it the panda.

00:21:57.000 | And now what we'll get is something

00:21:58.960 | from the insides of the neural net,

00:22:01.120 | and I could say, for example,

00:22:02.520 | let's just make those into images.

00:22:03.840 | Okay, so that's what the neural net had figured out

00:22:08.160 | about the panda after 10 layers

00:22:10.680 | of going through the neural net.

00:22:12.240 | And maybe, actually, it'd be interesting to see,

00:22:13.880 | let's do a feature space plot.

00:22:15.600 | So now we're going to, of those intermediate things

00:22:19.920 | in the brain of the neural net, so to speak,

00:22:22.920 | this is now taking, so what this is just doing

00:22:25.800 | is to do dimension reduction on this space of images,

00:22:30.800 | and so it's not very exciting.

00:22:33.080 | It's probably mostly distinguishing these

00:22:34.680 | by total gray level, but that's kind of showing us

00:22:37.600 | the space of different sort of features

00:22:41.800 | of the insides of this neural net.

00:22:43.280 | So it's also, what's interesting to see here

00:22:45.760 | is things like the symbolic representation

00:22:47.680 | of the neural net, and if you're wondering

00:22:49.240 | how does that actually work inside,

00:22:51.080 | it's underneath, it's using MXNet,

00:22:52.880 | which we happen to have contributed to a lot,

00:22:55.280 | and there's sort of a bunch of symbolic layers

00:22:57.280 | on top of that that feed into that.

00:23:00.080 | And maybe I can show you here.

00:23:01.240 | Let me show you how you would train one of these neural nets.

00:23:03.400 | That's also kind of fun.

00:23:04.840 | So we have a data repository

00:23:08.040 | that has all sorts of useful data.

00:23:10.080 | One piece of data it has is a bunch

00:23:12.560 | of neural net training sets, so this is

00:23:14.840 | the standard MNIST training set of handwritten digits.

00:23:19.000 | Okay, so there's MNIST, and you notice

00:23:21.320 | that these things here, that's just an image

00:23:23.240 | which I could copy out, and I could do,

00:23:25.800 | let's say I could do color negate on that image

00:23:28.960 | 'cause it's just an image, and there's the result and so on.

00:23:33.200 | And now I could say, let's take a neural net,

00:23:36.400 | like let's take a simple neural net like Lynette,

00:23:38.560 | for example, okay, so let's take Lynette,

00:23:42.240 | and then let's take the untrained evaluation network.

00:23:47.200 | So this is now a version of Lynette,

00:23:49.200 | simple, standard neural net that didn't get trained.

00:23:52.480 | So for example, if I take that symbolic representation

00:23:56.120 | of Lynette, and I could say net initialize,

00:23:59.200 | then it will take that, and it'll just put random weights

00:24:03.000 | into Lynette, okay, so if I take those random weights,

00:24:06.320 | and I feed it a zero here, I feed it that image of a zero,

00:24:10.320 | it will presumably produce something completely random,

00:24:12.840 | in this particular case, two, right?

00:24:15.120 | So now what I would like to do is to take this,

00:24:18.000 | so that was just randomly initializing the weights.

00:24:20.480 | So now what I'd like to do is to take

00:24:22.360 | the MNIST training set, and I'd like to actually train

00:24:26.120 | Lynette using MNIST training set, so let's take this,

00:24:31.160 | and let's take a random sample of,

00:24:35.520 | let's say, I don't know, 1,000 pieces of Lynette.

00:24:39.400 | Come on, why is it having to load it again?

00:24:42.120 | There we go, okay, so there's a random sample there,

00:24:45.520 | it was on line 21, and now let me go down here,

00:24:48.640 | and say, where was it?

00:24:51.280 | Well, we can just take this thing here,

00:24:54.280 | so this is the uninitialized version of Lynette,

00:24:58.200 | and we can say take that, and then let's say

00:25:00.680 | net train of that with the thing on line 21,

00:25:04.840 | which was that 1,000 instances.

00:25:06.520 | So now what it's doing is it's running training on,

00:25:10.560 | and that's, you see the loss going down and so on.

00:25:13.600 | It's running training for those 1,000 instances of Lynette,

00:25:18.600 | and it will, we can stop it if we want to.

00:25:22.160 | Actually, this is a new display, this is very nice.

00:25:24.520 | This is a new version of Wolfram Language,

00:25:26.440 | which is coming out next week, which I'm showing you,

00:25:29.160 | but it's quite similar to what exists today,

00:25:31.360 | but because that's one of the features

00:25:33.800 | of running a software company is that you always run

00:25:35.800 | the very latest version of things, for better or worse,

00:25:39.520 | and this is also a good way to debug it,

00:25:41.360 | 'cause it's supposed to come out next week.

00:25:42.880 | If I find some horrifying bug, maybe it will get delayed,

00:25:46.520 | but let's try, let's try this.

00:25:51.520 | Okay, now it says it's zero, okay,

00:25:54.760 | and so this is now a trained version of Lynette,

00:25:57.720 | trained with that training data.

00:26:00.560 | One of the things, so we can talk about all kinds

00:26:04.760 | of details of neural nets and so on,

00:26:06.960 | but maybe I should zoom out to talk a little bit

00:26:08.760 | about bigger picture as I see it.

00:26:10.760 | So one question is, sort of a question of what is

00:26:17.080 | in principle possible to do with computation?

00:26:21.400 | So we have, as we're building all kinds of things,

00:26:24.800 | we're making image identifiers,

00:26:26.320 | we're figuring out all kinds of things

00:26:28.480 | about where the International Space Station is and so on.

00:26:31.080 | Question is, what is in principle possible to compute?

00:26:35.560 | And so one of the places one can ask that question

00:26:39.560 | is when one looks at, for example,

00:26:41.760 | models of the natural world.

00:26:43.120 | One can say, how do we make models of the natural world?

00:26:46.120 | Kind of a traditional approach has been,

00:26:48.760 | let's use mathematical equations

00:26:50.320 | to make models of the natural world.

00:26:52.120 | A question is, if we want to kind of generalize that

00:26:55.280 | and say, well, what are all possible ways

00:26:56.920 | to make models of things,

00:26:58.560 | what can we say about that question?

00:27:00.880 | So I spent many years of my life

00:27:02.880 | trying to address that question.

00:27:04.880 | And basically what I've thought about a lot

00:27:08.040 | is that if you want to make a model of a thing,

00:27:10.680 | you have to have definite rules

00:27:12.080 | by which the thing operates.

00:27:13.560 | What's the most general way to represent possible rules?

00:27:16.920 | Well, in today's world, we think of that as a program.

00:27:19.520 | So the next question is, well,

00:27:20.600 | what does the space of all possible programs look like?

00:27:23.920 | And most of the time, we're writing programs

00:27:26.040 | like Wolfram Language is 50 million lines of code,

00:27:29.120 | and it's a big, complicated program

00:27:30.880 | that was built for a fairly specific purpose.

00:27:34.080 | But the question is, if we just look at

00:27:35.880 | sort of the space of possible programs,

00:27:38.040 | more or less at random,

00:27:38.960 | what's out there in the space of possible programs?

00:27:40.960 | So I got interested many years ago in cellular automata,

00:27:44.640 | which are a really good example

00:27:45.960 | of a very simple kind of program.

00:27:48.080 | So let me show you an example of one of these.

00:27:49.960 | So these are the rules for a typical cellular automaton.

00:27:54.200 | And this just says you have a row of black and white squares,

00:27:58.200 | and this just says you look at a square,

00:28:01.200 | say what color is that square,

00:28:02.360 | what color are its left and right neighbors,

00:28:05.020 | decide what color of the square will be on the next step

00:28:07.240 | based on that rule.

00:28:08.600 | Okay, so really simple rule.

00:28:10.360 | So now let's take a look at what actually happens

00:28:13.520 | if we use that rule a bunch of times.

00:28:15.400 | So we can take that rule,

00:28:17.320 | the 254 is just the binary digits

00:28:20.200 | that correspond to those positions in this rule.

00:28:23.160 | So now I can say this, I could say let's do 50 steps,

00:28:26.800 | let me do this.

00:28:28.560 | And now if I run according to the rule I just defined,

00:28:33.560 | it turns out to be pretty trivial.

00:28:35.480 | It's just saying, if any square is,

00:28:39.080 | if we start off with a black square,

00:28:40.640 | if any square is, if any neighboring square is black,

00:28:43.960 | make a black square.

00:28:44.880 | So we've used a very simple program.

00:28:47.140 | We got a very simple result out.

00:28:49.040 | Okay, let's try a different program.

00:28:50.400 | We can try changing this.

00:28:52.040 | We'll get, that's a program with one bit different.

00:28:55.960 | Now we get that kind of pattern.

00:28:57.840 | So the question is, well, what happens,

00:28:59.880 | you might say, okay, if you've got such a trivial program,

00:29:03.120 | it's not surprising you're just gonna get

00:29:04.400 | trivial results out.

00:29:06.000 | So, but you can do an experiment to test that hypothesis.

00:29:09.000 | You can just say, let's take all possible programs,

00:29:11.440 | there are 256 possible programs

00:29:13.920 | that are based on these eight bits here.

00:29:16.360 | Let's just take, well, let's just, whoops.

00:29:19.420 | Let's just take, let's say the first 64 of those programs

00:29:24.420 | and let's just make a, there we go.

00:29:28.100 | Let's just make a table of the results that we get

00:29:32.020 | by running those first 64 programs here.

00:29:35.460 | So here we get the result.

00:29:37.280 | And what you see is, well, most of them are pretty trivial.

00:29:39.700 | They start off with one black cell in the middle

00:29:42.020 | and it just tools off to one side.

00:29:44.180 | Occasionally we get something more exciting happening

00:29:46.100 | like here's a nice nested pattern that we get.

00:29:48.540 | If we were to continue it longer,

00:29:49.700 | it would make more detailed nesting.

00:29:53.560 | But then, my all time favorite science discovery,

00:29:57.700 | if you go on and just look at these,

00:29:59.620 | after a while you find this one here,

00:30:02.140 | which is rule 30 in this numbering scheme.

00:30:06.100 | And that's doing something a bit more complicated.

00:30:08.260 | You say, well, what's going on here?

00:30:10.020 | We just started off with this very simple rule.

00:30:12.260 | Let's see what happens.

00:30:13.100 | Maybe after a while, if we run rule 30

00:30:15.900 | long enough, it will resolve into something simpler.

00:30:18.940 | So let's try running it, let's say 500 steps.

00:30:21.400 | And that's the, whoops, that's the result we get.

00:30:24.820 | Let's say, let's just make it full screen.

00:30:29.820 | Okay, it's aliasing a bit on the projector there,

00:30:35.480 | but you get the basic idea.

00:30:37.720 | This is a, so this just started off

00:30:39.660 | from one black cell at the top and this is what it made.

00:30:42.820 | And that's pretty weird because all this is,

00:30:45.340 | you know, this is sort of not the way it's supposed,

00:30:48.220 | things are supposed to work.

00:30:49.340 | 'Cause what we have here is just that little program

00:30:52.480 | down there and it makes this big complicated pattern here.

00:30:55.980 | And, you know, we can see there's a certain amount

00:30:57.620 | of regularity on one side, but for example,

00:31:00.100 | the center column of this pattern is,

00:31:02.180 | for all practical purposes, completely random.

00:31:04.220 | In fact, it was, we used it as the random number generator

00:31:07.380 | in mathematical and morphem language for many years.

00:31:09.820 | It was recently retired after excellent service

00:31:12.980 | because we found a somewhat more efficient one.

00:31:15.280 | But the, so, you know, what do we learn from this?

00:31:21.320 | What we learn from this is out in the computational universe

00:31:24.660 | of possible programs, it's possible to get,

00:31:27.840 | even with very simple programs,

00:31:29.620 | very rich, complicated behavior.

00:31:31.900 | Well, that's important if you're interested

00:31:33.300 | in modeling the natural world because you might think

00:31:36.560 | that there are programs that represent systems in nature

00:31:39.600 | that might work this way and so on.

00:31:40.940 | It's also important for technology because it says,

00:31:44.300 | okay, let's say you're trying to find a,

00:31:48.220 | let's say you're trying to find a program

00:31:49.460 | that's a good random number generator.

00:31:50.740 | How are you gonna do that?

00:31:52.060 | Well, you could start thinking very hard

00:31:53.940 | and you could try and make up, you know,

00:31:55.220 | you could try and write down all kinds of flow charts

00:31:57.820 | about how this random number generator is going to work.

00:32:00.100 | Or you can say, forget that, I'm just gonna search

00:32:02.120 | the computational universe of possible programs

00:32:04.500 | and just look for one that serves

00:32:06.840 | as a good random number generator.

00:32:08.100 | In this particular case, after you've searched 30 programs,

00:32:11.260 | you'll find one that makes a good random number generator.

00:32:13.760 | Why does it work?

00:32:15.140 | That's a complicated story.

00:32:16.740 | It's not a story that I think necessarily

00:32:19.320 | we can really tell very well.

00:32:21.260 | But what's important is that this idea

00:32:24.120 | that out in the computational universe,

00:32:25.660 | there's a lot of rich, sophisticated stuff

00:32:29.140 | that can be essentially mined for our technological purposes.

00:32:32.560 | That's the important thing.

00:32:34.220 | Whether we understand how this works is a different matter.

00:32:37.540 | I mean, it's like when we look at the natural world,

00:32:39.500 | the physical world, we're used to kind of mining things.

00:32:42.220 | You know, we started using magnets to do magnetic stuff

00:32:45.860 | long before we understood the theory

00:32:48.140 | of ferromagnetism and so on.

00:32:50.020 | And so similarly here, we can sort of go out

00:32:52.880 | into the computational universe

00:32:54.260 | and find stuff that's useful for our purposes.

00:32:57.260 | Now, in fact, the world of sort of deep learning

00:33:00.820 | and neural nets and so on is a little bit like this.

00:33:02.900 | It uses the trick that there's a certain degree

00:33:05.340 | of differentiability there,

00:33:06.600 | so you can kind of home in on let's try

00:33:09.380 | and find something that's incrementally better.

00:33:11.660 | And for certain kinds of problems, that works pretty well.

00:33:14.480 | I think the thing that we've done a lot, I've done a lot,

00:33:17.280 | is just sort of exhaustive search

00:33:19.380 | in the computational universe of possible programs.

00:33:21.420 | Just search a trillion programs and try and find one

00:33:24.140 | that does something interesting and useful for you.

00:33:27.340 | There's a lot of things to say about what,

00:33:30.220 | well, actually, in the search a trillion programs

00:33:32.340 | and find one that's useful,

00:33:33.860 | let me show you another example of that.

00:33:36.500 | Let's see.

00:33:37.340 | So I was interested a while ago in,

00:33:41.140 | I have to look something up here, sorry.

00:33:44.980 | Let me see here.

00:33:48.860 | In Boolean algebra,

00:33:54.620 | and I was interested in the space

00:33:57.880 | of all possible mathematicses.

00:33:59.760 | And let me just see here.

00:34:05.100 | I'm not finding what I wanted to find, sorry.

00:34:08.700 | That was a good example.

00:34:11.420 | I should have memorized this, but I haven't.

00:34:14.980 | So here we go.

00:34:17.940 | There it is.

00:34:18.780 | So I was interested in if you just look at,

00:34:25.080 | so we talked about sort of looking at

00:34:27.220 | the space of all possible,

00:34:29.100 | the space of all possible programs.

00:34:34.100 | Another thing you can do is say,

00:34:35.420 | if you're gonna invent mathematics from nothing,

00:34:37.940 | what possible axiom systems could we use in mathematics?

00:34:40.540 | So I was curious, where do,

00:34:43.380 | and that, again, might seem like

00:34:45.020 | a completely crazy thing to do,

00:34:46.860 | to just say, let's just start enumerating axiom systems

00:34:49.540 | at random and see if we find one

00:34:50.860 | that's interesting and useful.

00:34:52.340 | But it turns out, once you have this idea

00:34:56.040 | that out in the computational universe of possible programs,

00:34:59.060 | there's actually a lot of low-hanging fruit to be found,

00:35:02.020 | it turns out you can apply that idea in lots of places.

00:35:04.220 | I mean, the thing to understand is,

00:35:05.500 | why do we not see a lot of engineering structures

00:35:08.220 | that look like this?

00:35:09.060 | The reason is because our traditional model

00:35:11.900 | of engineering has been, we engineer things

00:35:14.380 | in a way where we can foresee what the outcome

00:35:16.940 | of our engineering steps are going to be.

00:35:19.660 | And when it comes to something like this,

00:35:21.300 | we can find it out in the computational universe,

00:35:23.780 | but we can't readily foresee what's going to happen.

00:35:25.740 | We can't do sort of a step-by-step design

00:35:28.380 | of this particular thing.

00:35:29.580 | And so in engineering, in human engineering,

00:35:31.780 | as it's been practiced so far,

00:35:33.860 | most of it has consisted of building things

00:35:36.660 | where we can foresee step-by-step

00:35:38.820 | what the outcome of our engineering is going to be.

00:35:40.480 | And we see that in programs,

00:35:41.980 | we see that in other kinds of engineering structures.

00:35:45.460 | And so there's sort of a different kind of engineering,

00:35:47.260 | which is about mining the computational universe

00:35:49.780 | of possible programs.

00:35:50.860 | And it's worth realizing there's a lot more

00:35:53.500 | that can be done a lot more efficiently

00:35:55.300 | by mining the computational universe of possible programs

00:35:58.220 | than by just constructing things step-by-step as a human.

00:36:00.660 | So for example, if you look for optimal algorithms

00:36:03.060 | for things, like, I don't know,

00:36:04.420 | even something like sorting networks,

00:36:06.520 | the optimal sorting networks look very complicated.

00:36:09.700 | They're not things that you would construct

00:36:12.260 | by sort of step-by-step thinking about things

00:36:15.660 | with in a kind of typical human way.

00:36:19.980 | And so this idea,

00:36:23.020 | if you're really going to have computation work efficiently,

00:36:26.040 | you are going to end up with these programs

00:36:28.320 | that are sort of just mined from the computational universe.

00:36:31.520 | And one of the issues with mining things,

00:36:33.600 | so that this makes use of computation much more efficiently

00:36:37.900 | than a typical thing that we might construct.

00:36:40.240 | Now, one feature of this is

00:36:41.640 | it's hard to understand what's going on.

00:36:43.560 | And there's actually a fundamental reason for that,

00:36:45.500 | which is in our efforts to sort of understand

00:36:47.960 | what's going on, we get to use our brains,

00:36:50.400 | our computers, our mathematics, or whatever.

00:36:52.720 | And our goal is this particular little program

00:36:56.340 | did a certain amount of computation

00:36:57.560 | to work out this pattern.

00:36:58.960 | The question is, can we kind of outrun that computation

00:37:02.220 | and say, oh, I can tell that actually

00:37:04.720 | this particular bit down here is going to be a black bit.

00:37:08.920 | You don't have to go and do all that computation.

00:37:11.320 | But it turns out that, and again,

00:37:13.640 | this will maybe is a digression,

00:37:15.560 | which there's this phenomenon I call

00:37:17.840 | computational irreducibility,

00:37:19.660 | which I think is really common.

00:37:21.200 | And it's a consequence of this thing

00:37:22.360 | I call principle of computational equivalence.

00:37:24.640 | And that principle of computational equivalence

00:37:27.100 | basically says, as soon as you have a system

00:37:29.920 | whose behavior isn't fairly easy to analyze,

00:37:33.280 | the chances are that the computation it's doing

00:37:35.840 | is essentially as sophisticated as it could be.

00:37:38.500 | And that has consequences like it implies

00:37:40.600 | that the typical thing like this

00:37:42.800 | will correspond to a universal computer

00:37:45.080 | that you can use to program anything.

00:37:47.440 | It also has the consequence of this

00:37:48.880 | computational irreducibility phenomenon

00:37:50.940 | that says you can't expect our brains

00:37:53.880 | to be able to outrun the computations

00:37:55.640 | that are going on inside the system.

00:37:57.520 | If there was computational reducibility,

00:38:00.160 | then we can expect that this thing went to a lot of trouble

00:38:02.920 | and did a million steps of evolution.

00:38:05.100 | But actually just by using our brains,

00:38:06.800 | we can jump ahead and see what the answer will be.

00:38:09.540 | Computational irreducibility suggests that isn't the case.

00:38:12.180 | If we're going to make the most efficient use

00:38:14.240 | of computational resources,

00:38:16.020 | we will inevitably run into computational irreducibility

00:38:18.660 | all over the place.

00:38:19.880 | It has the consequence that we get the situation

00:38:22.760 | where we can't readily sort of foresee

00:38:24.600 | and understand what's going to happen.

00:38:26.280 | So back to mathematics for a second.

00:38:28.640 | So this is just an axiom system that,

00:38:32.640 | so I looked for all possible,

00:38:34.200 | looked through sort of all possible axiom systems

00:38:36.960 | starting off with really tiny ones.

00:38:38.880 | And I asked the question,

00:38:40.000 | what's the first axiom system

00:38:42.160 | that corresponds to Boolean algebra?

00:38:44.560 | So it turns out this thing here,

00:38:46.360 | this tiny little thing here,

00:38:48.600 | generates all theorems of Boolean algebra.

00:38:50.400 | It is the simplest axiom for Boolean algebra.

00:38:53.480 | Now, something I have to show you this

00:38:55.120 | 'cause it's a new feature you see.

00:38:56.820 | If I say, find equational proof,

00:39:01.380 | let's say I want to prove commutativity

00:39:03.800 | of the NAND operation.

00:39:05.200 | I'm gonna show you something here.

00:39:06.960 | This is going to try to generate,

00:39:09.200 | let's see if this works.

00:39:11.040 | This is going to try to generate an automated proof

00:39:14.400 | based on that axiom system of that result.

00:39:16.980 | So it had 102 steps in the proof.

00:39:19.560 | And let's try and say,

00:39:21.720 | let's look at, for example, the proof network here.

00:39:24.560 | Actually, let's look at the proof dataset.

00:39:27.440 | No, that's not what I wanted.

00:39:28.920 | Oh, I should learn how to use this, shouldn't I?

00:39:32.200 | (audience laughing)

00:39:35.200 | Let's see.

00:39:37.720 | What I want is the,

00:39:40.360 | yeah, proof dataset.

00:39:42.600 | There we go.

00:39:43.440 | Very good.

00:39:44.260 | Okay, so this is,

00:39:47.160 | actually, let's say,

00:39:49.320 | first of all, let's say the proof graph.

00:39:51.760 | Okay, so this is gonna show me how that proof was done.

00:39:55.600 | So there are a bunch of lemmas that got proved,

00:39:58.560 | and from those lemmas, those lemmas were combined,

00:40:00.920 | and eventually it proved the result.

00:40:02.920 | So let's take a look at what some of those lemmas were.

00:40:06.160 | Okay, so here's the result.

00:40:10.120 | So after, so it goes through,

00:40:12.200 | and these are various lemmas it's using,

00:40:14.080 | and eventually, after many pages of nonsense,

00:40:17.360 | it will get to the result.

00:40:18.880 | Okay, each one of these,

00:40:19.800 | some of these lemmas are kind of complicated there.

00:40:22.240 | That's that lemma.

00:40:23.520 | It's a pretty complicated lemma,

00:40:25.280 | et cetera, et cetera, et cetera.

00:40:26.200 | So you might ask, what on earth is going on here?

00:40:28.840 | And the answer is,

00:40:29.740 | so I first generated a version of this proof 20 years ago,

00:40:32.680 | and I tried to understand what was going on,

00:40:34.300 | and I completely failed.

00:40:36.000 | And it's sort of embarrassing

00:40:37.200 | because this is supposed to be a proof.

00:40:39.360 | It's supposed to be demonstrating some results,

00:40:42.500 | and what we realize is that,

00:40:44.720 | you know, what does it mean to have a proof of something?

00:40:47.160 | What does it mean to explain how a thing is done?

00:40:50.280 | You know, what is the purpose of a proof?

00:40:52.080 | Purpose of a proof is basically

00:40:53.520 | to let humans understand why something is true.

00:40:56.760 | And so, for example, if you go to,

00:40:58.740 | let's say we go to Wolfram Alpha,

00:41:01.680 | and we do, you know, some random thing where we say,

00:41:05.520 | let's do, you know, an integral of something or another,

00:41:08.620 | it will be able to very quickly,

00:41:10.660 | in fact, it will take it only milliseconds internally

00:41:13.120 | to work out the answer to that integral, okay?

00:41:15.660 | But then somebody who wants to hand in a piece of homework

00:41:18.320 | or something like that needs to explain why is this true.

00:41:22.480 | Okay, well, we have this handy

00:41:24.960 | step-by-step solution thing here,

00:41:28.280 | which explains why it's true.

00:41:32.520 | Now, the thing I should admit

00:41:33.880 | about the step-by-step solution is it's completely fake.

00:41:37.200 | That is, the steps that are described

00:41:39.120 | in the step-by-step solution

00:41:40.280 | have absolutely nothing to do with the way

00:41:41.860 | that internally that integral was computed.

00:41:44.560 | These are steps created purely for the purpose

00:41:47.360 | of telling a story to humans

00:41:49.340 | about why this integral came out the way it did.

00:41:52.120 | And now what we're seeing,

00:41:53.480 | and so that's a, so there's one thing is knowing the answer,

00:41:56.040 | the other thing is being able to tell a story

00:41:57.720 | about why the answer worked that way.

00:41:59.760 | Well, what we see here is this is a proof,

00:42:02.920 | but it was an automatically generated proof,

00:42:05.280 | and it's a really lousy story for us humans.

00:42:07.980 | I mean, if it turned out that one of these theorems here

00:42:10.640 | was one that had been proved by Gauss or something

00:42:13.480 | and appeared in all the textbooks,

00:42:15.400 | we would be much happier because then we would start

00:42:17.440 | to have a kind of human representable story

00:42:20.540 | about what was going on.

00:42:21.440 | Instead, we just get a bunch of machine-generated lemmas

00:42:24.120 | that we can't understand,

00:42:25.040 | that we can't kind of wrap our brains around.

00:42:27.360 | And it's sort of the same thing that's going on

00:42:30.240 | in when we look at one of these neural nets.

00:42:32.920 | We're seeing, you know, when we were looking

00:42:34.480 | wherever it was at the innards of that neural net,

00:42:37.400 | and we say, well, how is it figuring out

00:42:39.320 | that that's a picture of a panda?

00:42:40.920 | Well, the answer is it decided that, you know,

00:42:43.840 | if we humans were saying, how would you figure out

00:42:45.960 | if it's a picture of a panda?

00:42:46.960 | We might say, well, look and see if it has eyes.

00:42:50.040 | That's a clue for whether it's an animal.

00:42:51.560 | Look and see if it looks like it's kind of round

00:42:53.840 | and furry and things.

00:42:55.340 | That's a version of whether it's a panda

00:42:57.240 | and et cetera, et cetera, et cetera.

00:42:59.220 | But what it's doing is it learned a bunch of criteria

00:43:02.240 | for, you know, is it a panda or is it one of 10,000

00:43:04.880 | other possible things that it could have recognized?

00:43:07.160 | And it learned those criteria in a way

00:43:09.640 | that was somehow optimal based on the training

00:43:12.400 | that it got and so on.

00:43:13.560 | But it learned things which were distinctions

00:43:15.440 | which are different from the distinctions

00:43:16.960 | that we humans make in the language that we as humans use.

00:43:21.200 | And so in some sense, you know, when we start talking about,

00:43:24.620 | well, describe a picture, we have a certain human language

00:43:27.560 | for describing that picture.

00:43:28.920 | We have, you know, in our human, in typical human languages,

00:43:32.020 | we have maybe 30 to 50,000 words

00:43:34.040 | that we use to describe things.

00:43:35.840 | Those words are words that have sort of evolved

00:43:38.480 | as being useful for describing the world that we live in.

00:43:42.400 | When it comes to this neural net, it could be using,

00:43:46.040 | it could say, well, the words that it has effectively

00:43:49.720 | learned which allow it to make distinctions

00:43:52.360 | about what's going on in the analysis that it's doing,

00:43:56.500 | it has effectively invented words

00:43:59.020 | that describe distinctions, but those words have nothing

00:44:02.080 | to do with our historically invented words

00:44:04.620 | that exist in our languages.

00:44:06.160 | So it's kind of an interesting situation

00:44:07.720 | that it is its way of thinking, so to speak.

00:44:10.800 | If you say, well, what's it thinking about?

00:44:12.300 | How do we describe what it's thinking?

00:44:14.120 | That's a tough thing to answer,

00:44:15.940 | because just like with the automated theorem,

00:44:19.220 | we're sort of stuck having to say,

00:44:22.520 | well, we can't really tell a human story

00:44:25.440 | because the things that it invented are things

00:44:27.780 | for which we don't even have words

00:44:29.060 | in our languages and so on.

00:44:31.200 | Okay, so one thing to realize is in this kind of space

00:44:36.240 | of sort of all possible computations,

00:44:38.680 | there's a lot of stuff out there that can be done.

00:44:40.880 | There's this kind of ocean of sophisticated computation.

00:44:44.880 | And then the question that we have to ask for us humans

00:44:49.680 | is, okay, how do we make use of all of that stuff?

00:44:53.200 | So what we've got kind of on the one hand

00:44:55.440 | is we've got the things we know how to think about,

00:44:58.520 | human languages, our way of describing things,

00:45:01.480 | our way of talking about stuff,

00:45:02.880 | that's the one side of things.

00:45:04.800 | The other side of things we have is this very powerful

00:45:07.360 | kind of seething ocean of computation on the other side

00:45:10.240 | where lots of things can happen.

00:45:12.000 | So the question is, how do we make use

00:45:13.800 | of this sort of ocean of computation

00:45:15.720 | in the best possible way for our human purposes

00:45:18.380 | and building technology and so on?

00:45:20.400 | And so the way I see my kind of part of what I've spent

00:45:25.400 | a very long time doing is kind of building a language

00:45:29.920 | that allows us to take human thinking on the one hand

00:45:33.440 | and describe and sort of provide

00:45:37.000 | a sort of computational communication language

00:45:39.840 | that allows us to get the benefit of what's possible

00:45:42.560 | over in the sort of ocean of computation

00:45:44.760 | in a way that's rooted in what we humans

00:45:48.080 | actually want to do.

00:45:49.480 | And so I kind of view Wolfram Language

00:45:52.040 | as being sort of an attempt to make a bridge between,

00:45:55.280 | so on the one hand, there's all possible computations.

00:45:58.760 | On the other hand, there's things we think we want to do.

00:46:01.800 | And I view Wolfram Language as being my best attempt

00:46:05.880 | right now to make a way to take our sort of human

00:46:10.480 | computational thinking and be able to actually implement it.

00:46:14.680 | So in a sense, it's a language which works on two sides.

00:46:18.760 | It's both a language where you as the machine

00:46:23.760 | can understand, okay, it's looking at this

00:46:28.320 | and that's what it's going to compute.

00:46:30.080 | But on the other hand, it's also a language for us humans

00:46:33.280 | to think about things in computational terms.

00:46:35.400 | So if I go and I, I don't know,

00:46:37.680 | one of these things that I'm doing here,

00:46:39.960 | whatever it is, this wasn't that exciting,

00:46:42.480 | but find shortest tour of the geo position

00:46:46.840 | of the capital cities in South America.

00:46:49.160 | That is a language, that's a representation

00:46:51.760 | and a precise language of something.

00:46:54.120 | And the idea is that that's a language

00:46:56.440 | which we humans can find useful

00:46:59.320 | in thinking about things in computational terms.

00:47:02.080 | It also happens to be a language

00:47:03.560 | that the machine can immediately understand and execute.

00:47:06.560 | And so I think this is sort of a general,

00:47:08.840 | when I think about AI in general,

00:47:10.920 | what is the sort of what's the overall problem?

00:47:14.640 | Well, part of the overall problem is,

00:47:16.000 | so how do we tell the AIs what to do, so to speak?

00:47:19.400 | There's this very powerful,

00:47:21.400 | this sort of ocean of computation is what we get to mine

00:47:24.640 | for purposes of building AI kinds of things.

00:47:27.520 | But then the question is, how do we tell the AIs what to do?

00:47:30.800 | And what I see, what I've tried to do with Wolfram Language

00:47:35.440 | is to provide a way of kind of accessing that computation

00:47:40.440 | and sort of making use of the knowledge

00:47:43.960 | that our civilization has accumulated.

00:47:46.500 | And because that's the, you know,

00:47:49.240 | there's the general computation on this side,

00:47:52.080 | and there's the specific things

00:47:53.640 | that we humans have thought about.

00:47:55.480 | And the question is to make use of the things

00:47:58.480 | that we've thought about to do things

00:48:01.120 | that we care about doing.

00:48:01.960 | Actually, if you're interested in these kinds of things,

00:48:04.080 | I happen to just write a blog post last couple of days ago.

00:48:09.080 | It's kind of a funny blog post.

00:48:10.560 | It's about, well, you can see the title there.

00:48:13.360 | It came because a friend of mine has this crazy project

00:48:16.640 | to put little sort of disks or something

00:48:22.080 | that should represent kind of the best achievements

00:48:25.600 | of human civilization, so to speak,

00:48:27.440 | to send out its hitchhiking on various spacecraft

00:48:31.040 | that are going out into the solar system

00:48:33.880 | in the next little while.

00:48:35.040 | And the question is what to put on this little disk

00:48:37.160 | that kind of represents, you know,

00:48:39.120 | the achievements of civilization.

00:48:40.920 | It's kind of depressing when you go back

00:48:43.400 | and you look at what people have tried to do on this before

00:48:47.280 | and realizing how hard it is to tell

00:48:49.640 | even whether something is an artifact or not.

00:48:52.760 | But this was sort of a, yeah, that's a good one.

00:48:56.040 | That's from 11,000 years ago.

00:48:57.720 | The question is can you figure out what on earth it is

00:49:01.080 | and what it means?

00:49:02.080 | But so what's relevant about this is this whole question

00:49:09.840 | of there are things that are out there

00:49:12.040 | in the computational universe.

00:49:14.000 | And when we think about extraterrestrial intelligence,

00:49:17.480 | I find it kind of interesting that artificial intelligence

00:49:21.480 | is our first example of an alien intelligence.

00:49:24.520 | We don't happen to have found what we view

00:49:26.720 | as extraterrestrial intelligence right now,

00:49:28.680 | but we are in the process of building

00:49:30.560 | pretty decent version of an alien intelligence here.

00:49:33.640 | And the question is if you ask questions like,

00:49:36.760 | well, you know, what is it thinking?

00:49:39.160 | Does it have a purpose in what it's doing and so on?

00:49:41.800 | And you're confronted with things like this.

00:49:43.200 | It's very, you can kind of do a test run

00:49:46.480 | of what's its purpose?

00:49:49.200 | What is it trying to do in a way that is very similar

00:49:52.680 | to the kinds of questions you would ask

00:49:54.080 | about extraterrestrial intelligence?

00:49:56.660 | But in any case, the main point is that I see

00:50:01.660 | this sort of ocean of computation.

00:50:05.240 | There's the let's describe what we actually want to do

00:50:07.800 | with that ocean of computation.

00:50:09.640 | And that's where, that's one of the primary problems

00:50:11.760 | we have.

00:50:12.600 | Now people talk about AI and what is AI going

00:50:15.160 | to allow us to automate?

00:50:16.880 | And my basic answer to that would be,

00:50:19.440 | we'll be able to automate everything that we can describe.

00:50:22.760 | The problem is it's not clear what we can describe.

00:50:25.680 | Or put another way, you imagine various jobs

00:50:28.480 | and people are doing things,

00:50:29.480 | they're repeated judgment jobs, things like this.

00:50:32.180 | They're where we can readily automate those things.

00:50:34.720 | But the thing that we can't really automate is saying,

00:50:37.520 | well, what are we trying to do?

00:50:39.320 | That is what are our goals?

00:50:41.120 | Because in a sense, when we see one of these systems,

00:50:44.120 | let's say it's a cellular automaton here.

00:50:48.080 | The question is, what is this cellular automaton

00:50:50.040 | trying to do?

00:50:51.520 | Maybe I'll give you another cellular automaton

00:50:53.560 | that is a little bit more exciting here.

00:50:55.760 | Let's do this one.

00:50:56.840 | So the question is, what is this cellular automaton

00:51:01.280 | trying to do?

00:51:02.720 | It's got this whole big structure here

00:51:04.760 | and things are happening with it.

00:51:05.880 | We can go, we can run it for a couple of thousand steps.

00:51:08.400 | We can ask, it's a nice example of kind of undecidability

00:51:11.360 | in action, what's gonna happen here?

00:51:13.160 | This is kind of the halting problem.

00:51:14.680 | Is this gonna halt?

00:51:15.520 | What's it gonna do?

00:51:17.280 | There's computational irreducibility,

00:51:18.880 | so we actually can't tell.

00:51:20.360 | There's a case where we know this is a universal computer,

00:51:22.480 | in fact, eventually, well, I won't even spoil it for you.

00:51:26.480 | If I went on long enough, it would go into some kind

00:51:29.920 | of cycle, but we can ask, what is this thing trying to do?

00:51:34.920 | What is it, you know, is it, what's it thinking about?

00:51:37.440 | What's its, you know, what's its goal?

00:51:39.480 | What's its purpose?

00:51:41.000 | And, you know, we get very quickly in a big mess

00:51:43.960 | thinking about those kinds of things.

00:51:45.280 | I've, one of the things that comes out of this principle

00:51:47.960 | of computational equivalence is thinking about

00:51:51.520 | what kinds of things have, are capable

00:51:56.000 | of sophisticated computation.

00:51:57.760 | So I mentioned a while back here,

00:52:01.600 | sort of my personal history with WolfMalpha

00:52:03.760 | of having thought about doing something like WolfMalpha

00:52:05.880 | when I was a kid and then believing that you sort of had

00:52:08.240 | to build a brain to make that possible and so on.

00:52:10.920 | And one of the things that I then thought was

00:52:14.160 | that there was some kind of bright line

00:52:16.200 | between what is intelligent

00:52:19.080 | and what is merely computational, so to speak.

00:52:22.040 | In other words, that there was something which is like,

00:52:23.920 | oh, we've got this great thing that we humans have

00:52:26.320 | that, you know, is intelligence and all these things

00:52:28.840 | in nature and so on and all the stuff that's going on there,

00:52:31.800 | it's just computation or it's just, you know,

00:52:34.320 | things operating according to rules, that's different.

00:52:36.360 | There's some bright line distinction between these things.

00:52:39.040 | Well, I think the thing that came about

00:52:41.720 | after I'd looked at all these cellular automata

00:52:44.200 | and all kinds of other things like that

00:52:46.160 | is I sort of came up with this principle

00:52:48.920 | of computational equivalence idea,

00:52:51.720 | which we've now got quite a lot of evidence for,

00:52:53.600 | which I talk about people are interested in,

00:52:56.120 | but that basically there isn't a,

00:53:00.080 | that once you reach a certain level

00:53:01.920 | of computational sophistication, everything is equivalent.

00:53:05.560 | And that means that, that implies

00:53:08.160 | that there really isn't a bright line distinction

00:53:10.320 | between, for example, the computations going on

00:53:12.160 | in our brains and the computations going on

00:53:14.400 | in the simple cellular automata and so on.

00:53:16.280 | And that essentially philosophical point

00:53:18.600 | is what actually got me to start trying to build

00:53:20.440 | both from alpha, because I realized that, gosh, you know,

00:53:23.480 | I'd been looking for this sort of,

00:53:25.080 | the magic bullets of intelligence,

00:53:27.000 | and I just decided probably there isn't one.

00:53:29.160 | And actually it's all just computation.

00:53:31.600 | And so that means we can actually in practice

00:53:33.440 | build something that does this kind of intelligent

00:53:35.920 | like thing, and so that's what I think is the case,

00:53:39.560 | is that there really isn't sort of a bright line distinction

00:53:41.960 | and that has more extreme consequences.

00:53:44.360 | Like people will say things like, you know,

00:53:46.160 | the weather has a mind of its own, okay?

00:53:48.840 | Sounds kind of silly, sounds kind of animistic,

00:53:51.040 | primitive and so on, but in fact, the, you know,

00:53:54.240 | fluid dynamics of the weather is as computationally

00:53:57.880 | sophisticated as the stuff that goes on in our brains.

00:54:01.680 | But we can start asking, but then you say,

00:54:03.880 | but the weather doesn't have a purpose.

00:54:05.920 | You know, what's the purpose of the weather?

00:54:07.200 | Well, you know, maybe the weather is trying to equalize

00:54:10.360 | the temperature between the, you know,

00:54:11.920 | the North Pole and the tropics or something.

00:54:15.600 | And then we have to say, well, but that's not a purpose

00:54:17.880 | in the way that we think about purposes.

00:54:19.560 | That's just, you know, and we get very confused.

00:54:22.240 | And in the end, what we realize is when we're talking

00:54:24.720 | about things like purposes, we have to have this kind

00:54:27.520 | of chain of provenance that goes back to humans

00:54:31.520 | and human history and all that kind of thing.

00:54:33.760 | And I think it's the same type of thing when we talk

00:54:35.760 | about computation and AI and so on.

00:54:37.960 | The thing that we, this question of sort of purpose,

00:54:41.720 | goals, things like this, that's the thing

00:54:44.200 | which is intrinsically human and not something

00:54:47.360 | that we can ever sort of automatically generate.

00:54:49.240 | It makes no sense to talk about automatically generating it

00:54:51.880 | because these computational systems,

00:54:53.480 | they do all kinds of stuff.

00:54:55.000 | You know, we can say they've got a purpose,

00:54:56.500 | we can attribute purposes to them, et cetera, et cetera,

00:54:58.560 | et cetera, but, you know, ultimately it's sort of

00:55:00.960 | a human thread of purpose that we have to deal with.

00:55:04.000 | So that means, for example, when we talk about AIs

00:55:06.720 | and we're interested in things like, so how do we tell,

00:55:09.720 | you know, like we'd like to be able to tell,

00:55:12.040 | we talk about AI ethics, for example.

00:55:14.400 | We'd like to be able to make a statement to the AIs

00:55:17.240 | like, you know, please be nice to us humans.

00:55:20.500 | And that's a, you know, that's something,

00:55:24.560 | so one of the issues there is,

00:55:26.360 | so talking about that kind of thing,

00:55:30.140 | one of the issues is how are we going to make a statement

00:55:32.640 | like be nice to us humans?

00:55:34.820 | What's the, you know, how are we going to explain that

00:55:37.640 | to an AI?

00:55:38.960 | And this is where, again, you know, my efforts

00:55:42.960 | to build a language, a computational communication language

00:55:46.540 | that bridges the world of what we humans think about

00:55:50.500 | and the world of what is possible in computation

00:55:52.740 | is important, and so one of the things

00:55:54.500 | I've been interested in is actually building

00:55:56.640 | what I call a symbolic discourse language

00:55:58.900 | that can be a general representation

00:56:01.200 | for sort of the kinds of things that we might want to put in,

00:56:05.700 | that we might want to say in things like be nice to humans.

00:56:11.360 | So sort of a little bit of background to that.

00:56:13.740 | So, you know, in the modern world,

00:56:15.480 | people are keen on smart contracts.

00:56:17.740 | They often think of them as being deeply tied

00:56:19.540 | into blockchain, which I don't think is really quite right.

00:56:22.020 | The important thing about smart contracts

00:56:24.240 | is it's a way of having sort of an agreement

00:56:27.740 | between parties which can be executed automatically,

00:56:31.200 | and that agreement may be, you know,

00:56:33.040 | you may choose to sort of anchor that agreement

00:56:36.600 | in a blockchain, you may not,

00:56:38.240 | but the whole point is you have to,

00:56:39.760 | what you, you know, when people write legal contracts,

00:56:42.560 | they write them in an approximation to English.

00:56:44.640 | They write them in legalese typically

00:56:46.600 | 'cause they're trying to write them

00:56:47.420 | in something a little bit more precise than regular English,

00:56:50.060 | but the limiting case of that is to make

00:56:53.220 | a symbolic discourse language in which you can write

00:56:56.200 | the contract in code basically.

00:56:58.560 | And I've been very interested in using Wolfram Language

00:57:01.980 | to do that because in Wolfram Language,

00:57:03.840 | we have a language which can describe things about the world

00:57:07.460 | and we can talk about the kinds of things

00:57:10.240 | that people actually talk about in contracts and so on.

00:57:13.000 | And we're most of the way there to being able to do that.

00:57:16.200 | And then when you start thinking about that,

00:57:19.760 | you start thinking about, okay,

00:57:21.160 | so we've got this language to describe things

00:57:24.300 | that we care about in the world.

00:57:26.700 | And so when it comes to things like tell the AIs

00:57:29.420 | to be nice to the humans,

00:57:31.100 | we can imagine using Wolfram Language

00:57:33.700 | to sort of build an AI constitution that says

00:57:36.220 | this is how the AI is supposed to work.

00:57:38.320 | But when we talk about sort of just the untethered,

00:57:42.020 | you know, the untethered AI doesn't have any particular,

00:57:45.260 | it's just gonna do what it does.

00:57:47.100 | And if we want it to, you know,

00:57:48.860 | if we want to somehow align it with human purposes,

00:57:51.600 | we have to have some way to sort of talk to the AI.

00:57:54.640 | And that's, you know, I view my efforts

00:57:58.800 | to build Wolfram Language as a way to do that.

00:58:01.160 | I mean, you know, as I was showing at the beginning,

00:58:03.880 | you can use, you can take natural language

00:58:07.880 | and with natural language,

00:58:09.400 | you can build up a certain amount of,

00:58:11.760 | you can say a certain number of things in natural language.

00:58:14.080 | You can then say, well, how do we make this more precise

00:58:16.600 | in a precise symbolic language?

00:58:18.480 | If you want to build up more complicated things,

00:58:20.900 | it gets hard to do that in natural language.

00:58:23.220 | And so you have to kind of build up more serious programs

00:58:26.260 | in symbolic language.

00:58:29.340 | And I've probably been yakking a while here

00:58:33.220 | and I'm happy to, I can talk about

00:58:35.820 | all kinds of different things here,

00:58:37.060 | but maybe I've not seen as many reactions

00:58:40.260 | as I might've expected to think.

00:58:41.780 | So I'm not sure which things people are interested

00:58:44.140 | in which they're not.

00:58:44.980 | But so maybe I should stop here

00:58:48.020 | and we can have discussion, questions, comments.

00:58:50.720 | Yes.

00:58:51.560 | (audience applauding)

00:58:54.120 | - Yes, two microphones if you have questions,

00:58:56.800 | please come up.

00:58:58.200 | - So I have a quick question.

00:58:59.800 | It goes to the earlier part of your talk

00:59:01.600 | where you say you don't build a top-down ontology,

00:59:04.280 | you actually build from the bottom up

00:59:06.040 | with disparate domains.

00:59:08.120 | What do you feel are the core technologies

00:59:10.120 | of the knowledge representation

00:59:11.520 | which you use within Wolfram Alpha

00:59:13.640 | that allows you, you know, different domains

00:59:16.060 | to reason about each other, to come up with solutions?

00:59:18.400 | And is there any feeling of differentiability,

00:59:21.220 | for example, so if you were to come up with a plan

00:59:24.320 | to do something new within Wolfram Alpha language,

00:59:28.020 | you know, how would you go about doing that?

00:59:30.340 | - Okay, so we've done maybe a couple of thousand domains.

00:59:34.920 | What is actually involved in doing one of these domains?

00:59:40.020 | It's a gnarly business.

00:59:42.560 | Every domain has some crazy different thing about it.

00:59:45.620 | I tried to make up actually a while ago,

00:59:47.860 | let me show you something,

00:59:50.660 | a kind of a hierarchy of what it means to make,

00:59:54.300 | see if I can find this here,

00:59:55.940 | kind of a hierarchy of what it means

00:59:57.520 | to make a domain computable.

00:59:59.440 | Where is it?

01:00:01.900 | There we go.

01:00:02.740 | Okay, here we go.

01:00:09.940 | So this is sort of a hierarchy of levels

01:00:11.860 | of what it means to make a domain computable

01:00:13.980 | from just, you know, you've got some array of data

01:00:18.900 | that's quite structured.

01:00:19.900 | Forget, you know, the separate issue

01:00:21.740 | about extracting things from unstructured data,

01:00:24.220 | but let's imagine that you were given,

01:00:26.060 | you know, a bunch of data about landing sites

01:00:30.340 | of meteorites or something, okay?

01:00:32.500 | So you go through various levels.

01:00:33.900 | So, you know, things like, okay,

01:00:36.460 | the landing sites of the meteorites,

01:00:37.780 | are the positions just strings,

01:00:40.580 | or are they some kind of canonical representation

01:00:42.500 | of geoposition?

01:00:43.860 | Is the, you know, is the type of meteorite,

01:00:46.580 | you know, some of them are iron meteorites,

01:00:48.220 | some of them are stone meteorites.

01:00:49.700 | Have you made a canonical representation?

01:00:52.260 | Have you made some kind of way to identify what--

01:00:57.260 | - Sorry, go ahead.

01:00:59.260 | - No, no, I mean, to do that, so--

01:01:01.060 | - So my question is like, you know,

01:01:02.020 | if you did have positions as a string

01:01:03.980 | as well as a canonical representation,

01:01:05.900 | do you have redundant pieces of the same,

01:01:08.460 | redundant representations of the same information

01:01:11.380 | in the different--

01:01:13.780 | - No, I mean, our goal--

01:01:15.380 | - Is everything canonical that you have?

01:01:16.820 | Do you have a minimal representation of everything?

01:01:18.780 | - Yeah, our goal is to make everything canonical.

01:01:21.300 | Now, that's, you know, there is a lot of complexity

01:01:24.620 | in doing that.

01:01:25.460 | I mean, if you, you know, in each,

01:01:27.300 | okay, so another feature of these domains.

01:01:29.660 | Okay, so here's another thing to say.

01:01:32.020 | You know, it would be lovely

01:01:35.020 | if one could just automate everything

01:01:36.380 | and cut the humans out of the loop.

01:01:38.380 | Turns out this doesn't work.

01:01:40.540 | And in fact, whenever we do these domains,

01:01:42.860 | it's fairly critical to have expert humans

01:01:45.420 | who really understand the domain

01:01:46.740 | or you simply get it wrong.

01:01:48.500 | And it's also, having said that,

01:01:51.100 | once you've done enough domains,

01:01:52.340 | you can do a lot of cross-checking between domains

01:01:54.700 | and we are the number one reporters of error

01:01:57.980 | and of errors in pretty much all standardized data sources

01:02:01.940 | because we can do that kind of cross-checking.

01:02:04.460 | But I think, you know, if you ask the question,

01:02:07.140 | what's involved in bringing online a new domain,

01:02:12.460 | it's, you know, those sort of hierarchy of things,

01:02:15.860 | you know, some of those take a few hours.

01:02:18.020 | You can get to the point of having,

01:02:20.300 | you know, we've got good enough tools for ingesting data,

01:02:23.300 | figuring out, oh, those are names of cities in that column.

01:02:25.980 | Let's, you know, let's canonicalize those.

01:02:28.700 | You know, some may be questions,

01:02:29.940 | but many of them we'll be able to nail down.

01:02:32.860 | And to get to the full level

01:02:34.900 | of you've got some complicated domain

01:02:36.900 | and it's fully computable is probably a year of work.

01:02:41.460 | And you might say, well, gosh,

01:02:43.900 | why are you wasting your time?

01:02:45.300 | You've got to be able to automate that.

01:02:46.700 | So you can probably tell we're fairly sophisticated

01:02:48.780 | about machine learning kinds of things and so on.

01:02:50.900 | And we have tried, you know, to automate as much as we can.

01:02:54.660 | And we have got a pretty efficient pipeline,

01:02:57.140 | but if you actually want to get it right,

01:02:59.100 | and you see, here's an example of what happens.

01:03:02.060 | There's a level, even going between Wolfram Alpha

01:03:04.620 | and Wolfram Language, there's a level of,

01:03:07.140 | so for example, let's say you're looking at,

01:03:09.660 | you know, lakes in Wisconsin, okay?

01:03:12.500 | So people are querying about lakes in Wisconsin

01:03:14.660 | and Wolfram Alpha, they'll name a particular lake

01:03:17.420 | and they want to know, you know, how big is the lake?

01:03:20.100 | Okay, fine.

01:03:21.220 | In Wolfram Language, they'll be doing

01:03:23.020 | a systematic computation about lakes in Wisconsin.

01:03:25.980 | So if there's a lake missing,

01:03:27.740 | you're gonna get the wrong answer.

01:03:29.380 | And so that's a kind of higher level of difficulty.

01:03:33.700 | - Okay.

01:03:34.540 | - But there's, yeah, I think you're asking

01:03:37.140 | some more technical questions about ontologies

01:03:38.900 | and I can try and answer those.

01:03:40.260 | - Actually, one quick question.

01:03:41.660 | Can you-- - Wait, wait, wait, wait, wait.

01:03:43.140 | No, there's a lot of other questions.

01:03:45.060 | - Yeah, that's fine. - Okay.

01:03:45.900 | - Thank you very much, that was a great question.

01:03:47.580 | - We'll recycle this.

01:03:48.620 | - To the left here, please.

01:03:50.180 | - I've got a simple question.

01:03:51.700 | Who or what are your key influences?

01:03:54.260 | - Oh gosh.

01:03:56.700 | In terms of language design for Wolfram Language,

01:04:00.820 | for example-- - So in the context

01:04:01.820 | of machine intelligence, if you like,

01:04:03.140 | if you want to make it tailored to this audience.

01:04:08.180 | - I don't know, I've been absorbing stuff forever.

01:04:10.660 | I think my main, in terms of language design,

01:04:14.220 | probably Lisp and APL were my sort of early influences.

01:04:19.220 | But in terms of thinking about AI, hmm.

01:04:24.140 | You know, in, I mean, I'm kind of quite knowledgeable.

01:04:30.820 | I like history of science.

01:04:33.860 | I'm pretty knowledgeable about the early history

01:04:36.540 | of kind of mathematical logic, symbolic kinds of things.

01:04:40.020 | I would say, okay, maybe I can answer that in the negative.

01:04:43.540 | I have, for example, in building Wolfram Alpha,

01:04:46.180 | I thought, gosh, let me do my homework,

01:04:49.620 | let me learn all about computational linguistics,

01:04:51.420 | let me hire some computational linguistics PhDs.

01:04:54.220 | That will be a good way to get this started.

01:04:56.260 | Turns out, we used almost nothing

01:04:58.700 | from the previous sort of history

01:05:01.220 | of computational linguistics,

01:05:02.700 | partly because what we were trying to do,

01:05:04.100 | namely short question natural language understanding,

01:05:07.100 | is different from a lot of the natural language processing,

01:05:09.540 | which has been done in the past.

01:05:11.460 | I also have made, to my disappointment,

01:05:14.780 | very little use of, you know, people like Marvin Minsky,

01:05:18.900 | for example, I really don't think,

01:05:21.020 | I mean, I knew Marvin for years,

01:05:22.540 | and in fact, some of his early work

01:05:24.380 | on simple Turing machines and things,

01:05:26.660 | those are probably more influential to me

01:05:29.100 | than his work on AI.

01:05:31.380 | And, you know, probably my mistake

01:05:34.340 | of not understanding that better,

01:05:36.100 | but really, I would say that I've been rather uninfluenced

01:05:39.340 | by sort of the traditional AI kinds of things.

01:05:42.780 | I mean, it probably hasn't helped

01:05:43.900 | that I've kind of lived through a time

01:05:45.580 | when sort of AI went from, you know,

01:05:48.380 | when I was a kid, AI was gonna solve everything in the world

01:05:50.820 | and then, you know, it kind of decayed for a while

01:05:53.180 | and then sort of come back.

01:05:54.540 | So I would say that I can describe my negative,

01:05:57.660 | my non-influence is better than my influence.

01:05:59.700 | - The impression you give is that you made it up

01:06:01.060 | out of your own head,

01:06:01.900 | and it sounds as though that's pretty much right.

01:06:04.620 | - Yeah, I mean, yes.

01:06:05.940 | I mean, insofar as there's things to,

01:06:08.500 | I mean, look, things like the, you know,

01:06:13.220 | okay, so for example, studying simple programs

01:06:16.060 | and trying to understand the universe of simple programs,

01:06:19.060 | actually, the personal history of that

01:06:20.860 | is sort of interesting.

01:06:21.700 | I mean, I, you know, I used to do particle physics

01:06:26.060 | when I was a kid, basically,

01:06:27.860 | and then I actually got interested,

01:06:31.540 | okay, so I'll tell you the history of that,

01:06:32.820 | just as an example of how sort of interesting

01:06:34.820 | as a sort of history of ideas type thing.

01:06:36.980 | So I was interested in how order arises in the universe.

01:06:40.860 | So, you know, you start off from the hot Big Bang

01:06:43.340 | and then pretty soon you end up with a bunch of humans

01:06:46.020 | and galaxies and things like this.

01:06:47.300 | How does this happen?

01:06:48.820 | So I got interested in that question.

01:06:50.580 | I was also interested in things like neural networks

01:06:54.340 | for sort of AI purposes,

01:06:56.300 | and I thought, let me make a minimal model

01:06:59.460 | that encompasses sort of how complex things arise

01:07:02.780 | from other stuff,

01:07:04.860 | and I ended up sort of making simpler and simpler

01:07:07.940 | and simpler models and eventually wound up

01:07:09.580 | with cellular automata,

01:07:11.060 | and which I didn't know were called cellular automata

01:07:13.060 | when I started looking at them

01:07:14.660 | and then found they did interesting things,

01:07:16.660 | and the two areas where cellular automata

01:07:18.500 | have been singularly unuseful in analyzing things

01:07:22.820 | are large scale structure in the universe

01:07:25.580 | and neural networks.

01:07:26.980 | So it turned out, but that, by the way,

01:07:30.140 | the fact that I kind of even imagined

01:07:32.020 | that one could just start, yeah, I should say,

01:07:34.540 | you know, I've been doing physics,

01:07:36.340 | and in physics, the kind of intellectual concept

01:07:39.460 | is you take the world as it is

01:07:41.220 | and you try and drill down and find out what,

01:07:43.460 | you know, what makes the world out of primitives and so on.

01:07:46.100 | It's kind of a, you know, reduced to find things.

01:07:48.940 | Then I built my first computer language,

01:07:51.180 | I think called SMP, which went the other way around,

01:07:53.860 | where I was just like, I'm just gonna make up

01:07:55.340 | this computer language and, you know,

01:07:57.900 | just make up what I want the primitives to be

01:07:59.700 | and then I'm gonna build stuff up from it.

01:08:01.500 | I think that the fact that I kind of had the idea

01:08:04.140 | of doing things like making up cellular automata

01:08:06.580 | as possible models for the world

01:08:08.580 | was a consequence of the fact that I worked

01:08:09.940 | on this computer language, which was a thing

01:08:11.940 | which worked the opposite way around

01:08:13.500 | from the way that one is used to doing natural science,

01:08:16.060 | which is sort of this reductionist approach.

01:08:18.420 | And that's, I mean, so that's just an example

01:08:21.020 | of, you know, I found, I happen to have spent

01:08:25.020 | a bunch of time studying, as I say, history of science.

01:08:28.180 | And one of my hobbies is sort of history of ideas.

01:08:31.660 | I even wrote this little book called Idea Makers,

01:08:33.940 | which is about biographies of a bunch of people

01:08:36.220 | who for one reason or another I've written about.

01:08:38.220 | And so I'm always curious about this thing

01:08:40.180 | about how do people actually wind up figuring out

01:08:42.260 | the things they figure out.

01:08:43.780 | And, you know, one of the conclusions of my,

01:08:47.660 | you know, investigations of many people

01:08:49.460 | is there are very rarely moments of inspiration.

01:08:53.420 | Usually it's long, multi-decade kinds of things,

01:08:56.900 | which only later get compressed into something short.

01:09:00.220 | And also the path is often much, you know,

01:09:05.220 | it's quite, what can I say, that the steps are quite small,

01:09:10.260 | and, you know, but the path is often kind of complicated.

01:09:14.260 | And that's what it's been for me.

01:09:15.820 | So I-

01:09:16.660 | - Simple question, complex answer.

01:09:17.780 | - Sorry about that.

01:09:18.620 | (laughing)

01:09:19.460 | - Go ahead, please.

01:09:20.780 | - Hello.

01:09:21.620 | So what I basically see from the Wolfram language

01:09:24.660 | is it's a way to describe all of objective reality.

01:09:27.300 | It's kind of formalizing just about the entire domain

01:09:30.580 | of discourse, to use a philosophical term.

01:09:32.300 | And you kind of hinted at this in your lecture

01:09:34.940 | where it sort of leaves off,

01:09:36.500 | is that when we start to talk about

01:09:37.820 | more esoteric philosophical concepts, purpose,

01:09:41.420 | I guess this would lead into things like epistemology,

01:09:44.020 | because essentially you only have science there.

01:09:45.620 | And as amazing as science is,

01:09:46.920 | there are other things that are talked about,

01:09:48.440 | not, you know, like idealism versus materialism, et cetera.

01:09:52.700 | Do you have an idea of how Wolfram might or might not

01:09:56.900 | be able to branch into those discourses?

01:09:58.540 | Because I'm hearing echoes in my head of that time.

01:10:01.020 | Ballstrom said that an AI needs a, you know,

01:10:03.560 | when you give an AI a purpose, there's like,

01:10:05.440 | I think he said philosophers are divided completely evenly

01:10:08.500 | between the top four ways to measure

01:10:10.060 | how good something should be.

01:10:11.120 | It's like utilitarianism and-

01:10:12.900 | - Sure.

01:10:13.740 | - Do you have the four minus Japanese?

01:10:14.840 | - Yeah, right.

01:10:15.680 | So the first thing is, I mean,

01:10:16.580 | this problem of making what, okay,

01:10:19.720 | about 300 years ago, people like Leibniz

01:10:21.960 | were interested in the same problem that I'm interested in,

01:10:23.980 | which is how do you formalize sort of everyday discourse?

01:10:27.440 | And Leibniz had the original idea, you know,

01:10:29.400 | he was originally trained as a lawyer,

01:10:31.260 | and he had this idea, if he could only reduce all law,

01:10:34.520 | all legal questions to matters of logic,

01:10:37.240 | he could have a machine that would basically describe,

01:10:39.320 | you know, answer every legal case, right?

01:10:43.040 | He was unfortunately a few hundred years too early,

01:10:46.260 | even though he did have, you know, he tried to,

01:10:48.320 | he tried to do all kinds of things,

01:10:49.440 | very similar to things I've tried to do,

01:10:51.200 | like he tried to get various dukes

01:10:53.000 | to assemble big libraries of data and stuff like this,

01:10:56.480 | but the point, so what he tried to do

01:10:59.800 | was to make a formalized representation of everyday discourse

01:11:04.400 | for whatever reason, for the last 300 years,

01:11:06.480 | basically people haven't tried to do that.

01:11:08.560 | There's, it's an almost completely barren landscape.

01:11:11.700 | There was this period of time in the 1600s

01:11:14.480 | when people talked about philosophical languages.

01:11:17.320 | Leibniz was one, a guy called John Wilkins was another,

01:11:20.600 | and they tried to, you know, break down human thought

01:11:23.880 | into something symbolic.

01:11:25.760 | People haven't done that for a long time.

01:11:28.560 | In terms of what can we do that with, you know,

01:11:32.100 | I've been trying to figure out

01:11:33.720 | what the best way to do it is.

01:11:34.840 | I think it's actually not as hard as one might think.

01:11:37.720 | These areas, one thing you have to understand,

01:11:39.720 | these areas like philosophy and so on,

01:11:41.960 | are, they're on the harder end.

01:11:43.800 | I mean, things like, a good example, typical example,

01:11:46.800 | you know, I want to have a piece of chocolate, okay?

01:11:49.960 | The, in Wolfram language right now,

01:11:51.860 | we have a pretty good description of pieces of chocolate.

01:11:54.560 | We know all sorts of, you know,

01:11:55.980 | we probably know 100 different kinds of chocolate.

01:11:58.360 | We know how big the pieces are, all that kind of thing.

01:12:01.380 | The I want part of that sentence,

01:12:03.180 | we can't do that right now,

01:12:05.260 | but I don't think that's that hard, and I'm, you know,

01:12:08.160 | that's, now if you ask, let's say we had,

01:12:12.240 | I think the different thing you're saying is,

01:12:14.360 | let's say we had the omnipotent AI, so to speak,

01:12:17.720 | that was able to, you know,

01:12:19.200 | where we turn over the control of the central bank

01:12:21.520 | to the AI, we turn over all these other things to the AI.

01:12:24.640 | Then the question is, we say to the AI,

01:12:26.840 | now do the right thing.

01:12:28.800 | And then the problem with that is,

01:12:31.600 | and this is why I talk about, you know,

01:12:33.060 | creating AI constitutions and so on,

01:12:36.000 | we have absolutely no idea

01:12:38.240 | what do the right thing is supposed to mean.

01:12:39.760 | And philosophers have been arguing about that,

01:12:41.520 | you know, utilitarianism is an example of that,

01:12:44.280 | of one of the answers to that,

01:12:46.280 | although it's not a complete answer by any means,

01:12:48.280 | it's not really an answer,

01:12:49.520 | it's just a way of posing the question.

01:12:51.960 | And so I think that the, you know,

01:12:53.760 | one of the features of,

01:12:56.120 | so I think it's a really hard problem to, you know,

01:13:00.240 | you think to yourself,

01:13:01.120 | what should the AI constitution actually say?

01:13:03.400 | So first thing you might think is,

01:13:05.240 | oh, there's going to be, you know,

01:13:06.480 | something like Asimov's laws of robotics.

01:13:08.200 | There's going to be one, you know, golden rule for AIs.

01:13:11.960 | And if we just follow that golden rule, all will be well.

01:13:15.320 | Okay, I think that that is absolutely impossible.

01:13:18.880 | And in fact, I think you can even sort of mathematically

01:13:20.760 | prove that that's impossible.

01:13:22.960 | Because I think as soon as you have a system that,

01:13:26.280 | you know, essentially what you're trying to do

01:13:27.800 | is you're trying to put in constraints that,

01:13:31.840 | okay, basically, as soon as you have a system

01:13:33.680 | that shows computational irreducibility,

01:13:35.840 | I think it is inevitable that you have

01:13:38.600 | unintended consequences of things,

01:13:41.920 | which means that you never get to just say,

01:13:45.000 | put everything in this one very nice box.

01:13:47.700 | You always have to say, let's put in a patch here,

01:13:49.680 | let's put in a patch there, and so on.

01:13:51.560 | A version of this, much more abstract version of this,

01:13:54.040 | Godel's theorem.

01:13:55.300 | So Godel's theorem is, you know,

01:13:57.760 | it starts off by taking the, you know,

01:14:01.120 | Godel's theorem is trying to talk about integers.

01:14:04.400 | It says, start off with Peano's axioms.

01:14:07.280 | Peano's axioms, you might say, in Peano thought,

01:14:10.400 | describe the integers and nothing but the integers.

01:14:13.240 | Okay, so anything that's provable from Peano's axioms

01:14:17.120 | will be true about integers and vice versa, okay?

01:14:19.760 | What Godel's theorem shows is that that will never work,

01:14:22.840 | that there are an infinite hierarchy of patches

01:14:25.520 | that you have to put on to Peano's axioms

01:14:27.800 | if you want to describe the integers

01:14:29.400 | and nothing but the integers.

01:14:30.920 | And I think the same is true if you want to have

01:14:32.980 | a legal system effectively

01:14:34.880 | that has no bizarre unintended consequences.

01:14:37.720 | So I don't think it's possible to just say, you know,

01:14:40.200 | if you, when you're describing something in the world

01:14:43.040 | that's complicated like that,

01:14:43.920 | I don't think it's possible to just have

01:14:45.840 | a small set of rules that will always do what we want,

01:14:49.840 | so to speak.

01:14:50.680 | I think it's inevitable that you have to have a long,

01:14:52.640 | essentially, code of laws, and that's what, you know,

01:14:55.960 | so my guess is that what will actually have to happen is,

01:14:58.680 | you know, as we try and describe

01:14:59.920 | what should we want the AIs to do,

01:15:02.440 | you know, I don't know the sociopolitical aspects

01:15:04.580 | of how we'll figure out whether it's one AI constitution

01:15:09.220 | or one per, you know, city or whatever.

01:15:13.860 | We can talk about that, that's a separate issue,

01:15:15.540 | but, you know, I think what will happen is

01:15:18.300 | it'll be much like human laws.

01:15:19.640 | It'll be a complicated thing

01:15:20.780 | that gets progressively patched.

01:15:22.500 | And so I think it's some, and these ideas like,

01:15:25.560 | you know, oh, we'll just make the AIs, you know,

01:15:29.160 | run the world according to, you know, Mill's,

01:15:32.780 | you know, John Stuart Mill's idea, it's not gonna work.

01:15:36.500 | Which is not surprising, 'cause philosophy

01:15:39.260 | has made the point that it's not an easy problem

01:15:42.740 | for the last 2,000 years, and they're right.

01:15:45.300 | It's not an easy problem.

01:15:47.140 | - Thank you. - Yeah.

01:15:48.140 | - Hi, you're talking about computational irreducibility

01:15:53.460 | and computational equivalence, and also that earlier on

01:15:57.100 | in your intellectual adventures,

01:15:59.320 | you're interested in particle physics and things like that.

01:16:02.640 | I've heard you make the comment before in other contexts

01:16:06.560 | that things like molecules compute,

01:16:10.660 | and I was curious to ask you exactly what you mean by that,

01:16:13.420 | in what sense does a molecule--

01:16:16.380 | - I mean, what would you like to compute, so to speak?

01:16:21.380 | I mean, in other words, what is the case is that,

01:16:26.880 | you know, one definition of your computing

01:16:29.020 | is given a particular computation, like, I don't know,

01:16:31.460 | finding square roots or something, you know,

01:16:34.020 | you can program a, you know, the surprising thing

01:16:38.500 | is that an awful lot of stuff can be programmed

01:16:41.680 | to do any computation you want.

01:16:43.900 | That's some, and, you know, when it comes to,

01:16:46.780 | I mean, I think, for example, when you look

01:16:48.020 | at nanotechnology and so on, the current,

01:16:52.180 | you know, one of the current beliefs is

01:16:54.300 | to make very small computers, you should take

01:16:57.360 | what we know about making big computers

01:16:59.740 | and just, you know, make them smaller, so to speak.

01:17:03.120 | I don't think that's the approach you have to use.

01:17:05.340 | I think you can take the components that exist

01:17:07.740 | at the level of molecules and say,

01:17:09.980 | how do we assemble those components

01:17:12.680 | to be able to do complicated computations?

01:17:14.460 | I mean, it's like the cellular automata,

01:17:16.220 | that the, you know, the underlying rule

01:17:19.820 | for the cellular automaton is very simple,

01:17:21.900 | yet when that rule is applied many times,

01:17:24.780 | it can do a sophisticated computation.

01:17:26.880 | So I think that that's the sense in which,

01:17:29.500 | what can I say, the raw material that you need

01:17:33.100 | for computation can be, you know,

01:17:36.260 | there's a great diversity in the raw material

01:17:38.140 | that you can use for computation.

01:17:39.340 | Our particular human development, you know,

01:17:41.860 | stack of technologies that we use for computation right now

01:17:46.060 | is just one particular path, and we can, you know,

01:17:48.660 | so a very practical example of this is algorithmic drugs.

01:17:52.020 | So the question is, right now, drugs pretty much work by,

01:17:54.980 | most drugs work by, you know, there is a binding site

01:17:57.660 | on a molecule, drug fits into binding site, does something.

01:18:00.820 | Question is, can you imagine having something

01:18:03.480 | where the molecule, you know, is something

01:18:05.600 | which has computations going on in it,

01:18:08.060 | where it goes around and it looks at that, you know,

01:18:11.020 | that thing it's supposed to be binding to,

01:18:12.540 | and it figures out, oh, there's this knob here

01:18:14.540 | and that knob there, it reconfigures itself,

01:18:17.260 | it's computing something, it's trying to figure out,

01:18:19.980 | you know, is this likely to be a tumor cell or whatever,

01:18:22.500 | based on some more complicated thing.

01:18:24.580 | That's the type of thing that I mean

01:18:26.060 | by computations happening at molecular scale.

01:18:28.660 | - Okay, I guess I meant to ask, if it follows from that,

01:18:32.140 | if, in your view, like the molecules in the chalkboard

01:18:36.700 | and in my face and in the table are, in any sense,

01:18:39.620 | currently doing computing.

01:18:40.980 | - Sure, I mean, the question of what computation,

01:18:42.980 | look, one of the things to realize,

01:18:44.580 | if you look at kind of the sort of past and future of things,

01:18:48.100 | okay, so here's an observation, actually,

01:18:51.940 | I was about Leibniz, actually.

01:18:54.100 | In Leibniz's time, Leibniz made a calculator-type computer

01:18:58.660 | out of brass, took him 30 years, okay?

01:19:01.620 | So in his day, there was, you know,

01:19:03.780 | at most one computer in the world,

01:19:05.580 | as far as he was concerned, right?

01:19:07.300 | Today's world, there may be 10 billion computers,

01:19:09.260 | maybe 20 billion computers, I don't know.

01:19:11.780 | The question is, what's that gonna look like in the future?

01:19:14.220 | And I think the answer is that, in time,

01:19:17.260 | probably everything we have will be made of computers,

01:19:20.740 | in the following sense, that basically, it won't be,

01:19:23.340 | you know, in today's world, things are made of, you know,

01:19:25.780 | metal, plastic, whatever else,

01:19:27.740 | but actually, that won't make it,

01:19:29.380 | there won't be any point in doing that.

01:19:30.860 | Once we know how to do, you know,

01:19:32.660 | molecular-scale manufacturing and so on,

01:19:35.140 | we might as well just make everything

01:19:36.500 | out of programmable stuff.

01:19:38.420 | And I think that's a sense in which, you know,

01:19:42.540 | and, you know, the one example we have

01:19:44.100 | in molecular computing right now is us, in biology.

01:19:47.660 | You know, biology does a reasonable job

01:19:49.820 | of specific kinds of molecular computing.

01:19:52.380 | It's kind of embarrassing, I think,

01:19:53.740 | that the only, you know, molecule we know

01:19:56.060 | that's sort of a memory molecule is DNA,

01:19:58.420 | that's kind of, you know, which is kind of the, you know,

01:20:00.780 | the particular biological solution.

01:20:02.740 | In time, we'll know lots of others.

01:20:04.500 | And, you know, I think the, sort of the end point is,

01:20:09.100 | so if you're asking, is, you know,

01:20:11.060 | is computation going on in, you know,

01:20:13.220 | in this water bottle, the answer is absolutely.

01:20:15.660 | It's probably even many aspects of that computation

01:20:17.980 | are pretty sophisticated.

01:20:18.820 | If we wanted to know what would happen

01:20:20.540 | to particular molecules here, it's gonna be hard to tell.

01:20:23.120 | There's going to be computational irreducibility and so on.

01:20:25.740 | Can we make use of that for our human purposes?

01:20:28.660 | Can we piggyback on that to achieve something technological?

01:20:31.660 | That's a different issue.

01:20:32.980 | And that's, for that, we have to build up

01:20:34.860 | this whole, sort of, chain of technology

01:20:37.140 | to be able to connect it, which is what I've kind of been,

01:20:40.420 | been keep on talking about is, how do we connect,

01:20:43.740 | sort of, what is possible computationally in the universe

01:20:46.740 | to what we humans can kind of conceptualize

01:20:49.700 | that we want to do in computation?

01:20:51.060 | And that's, you know, that's the bridge that we have to make

01:20:53.220 | and that's the hard part.

01:20:54.220 | But getting, the intrinsic getting the computation done is,

01:20:57.260 | is, you know, there's computation going on

01:21:00.100 | all over the place.

01:21:01.100 | - Maybe a couple more questions.

01:21:04.580 | I was hoping you could elaborate on

01:21:07.020 | what you were talking about earlier of, like,

01:21:09.140 | searching the entire space of possible programs.

01:21:12.440 | So that's very broad.

01:21:14.780 | So maybe, like, what kind of searching of that space

01:21:18.740 | we're good at and, like, what we're not

01:21:20.380 | and I guess what the differences are.

01:21:21.220 | - Yeah, right, so, I mean, I would say that we're

01:21:23.260 | at an early stage in knowing how to do that, okay?

01:21:26.300 | So I've done lots of these things and they are,

01:21:29.900 | the thing that I've noticed is,

01:21:32.180 | if you do an exhaustive search,

01:21:34.220 | then you don't miss even things

01:21:35.820 | that you weren't looking for.

01:21:37.740 | If you do a non-exhaustive search,

01:21:39.700 | there is a tremendous tendency to miss things

01:21:42.620 | that you weren't looking for.

01:21:44.860 | And so, you know, we've done searches

01:21:48.500 | or a bunch of function evaluation in Wolfram Language

01:21:51.060 | is done by, was done by searching for optimal approximations

01:21:54.860 | in some big space.

01:21:56.380 | A bunch of stuff with hashing is done that way.

01:21:59.420 | Bunch of image processing is done that way.

01:22:01.540 | Where we're just sort of searching this, you know,

01:22:03.560 | doing exhaustive searches in maybe trillions of programs

01:22:06.540 | to find things.

01:22:07.620 | Now, you know, there is, on the other side of that story

01:22:11.180 | is the incremental improvements story

01:22:13.780 | with deep learning and neural networks and so on,

01:22:17.420 | where because there is differentiability,

01:22:20.200 | you're able to sort of incrementally

01:22:22.300 | get to a better solution.

01:22:23.900 | Now, in fact, people are making less and less

01:22:26.580 | differentiability in deep learning neural nets.

01:22:29.500 | And so, I think eventually there's going to be

01:22:31.740 | sort of a grand unification of these kinds of approaches.

01:22:36.420 | Right now, we're still, you know, I don't really know

01:22:39.460 | what the, you know, the exhaustive search side of things,

01:22:42.680 | which you can use for all sorts of purposes.

01:22:44.820 | I mean, the reason, the surprising thing

01:22:47.120 | that makes exhaustive search not crazy

01:22:49.420 | is that there is rich, sophisticated stuff

01:22:53.300 | near at hand in the computational universe.

01:22:55.180 | If you had to go, you know, quadrillions, you know,

01:22:58.200 | through a quadrillion cases

01:22:59.540 | before you ever found anything,

01:23:01.240 | exhaustive search would be hopeless.

01:23:03.160 | But you don't in many cases.

01:23:05.480 | And, you know, I would say that we are

01:23:07.400 | in a fairly primitive stage of the science

01:23:10.180 | of how to do those searches.

01:23:11.220 | Well, my guess is that there'll be

01:23:13.620 | some sort of unification, which needless to say,

01:23:15.980 | I've thought a bunch about,

01:23:17.100 | and between kind of the neural net.

01:23:20.160 | So, you know, the trade-off typically in neural nets

01:23:22.420 | is you can have a neural net that is very good at,

01:23:25.820 | that is, you know, uses its computational resources well,

01:23:29.200 | but it's really hard to train,

01:23:30.900 | or you can have a neural net

01:23:31.900 | that doesn't use its computational resources so well,

01:23:34.360 | but it's very easy to train,

01:23:35.680 | because it's very, you know, smoothly.

01:23:38.520 | And, you know, my guess is that somewhere in the,

01:23:41.260 | you know, harder to train,

01:23:43.100 | but makes use of things that are closer

01:23:45.520 | to the complete computational universe

01:23:47.960 | is where one's going to see progress.

01:23:50.340 | But it's a really interesting area,

01:23:52.860 | and, you know, I consider us only at the beginning

01:23:56.600 | of figuring that out.

01:23:57.700 | - One last question.

01:23:59.940 | - Hi.

01:24:00.780 | - Hello, keep going?

01:24:01.620 | - Yeah, okay. - All right, let's do it.

01:24:03.740 | - Thank you for your talk.

01:24:04.840 | Just to give a bit of context for my question,

01:24:06.900 | I research how we could teach AI to kids,

01:24:09.260 | and developing platforms for that,

01:24:10.780 | how we could teach artificial intelligence

01:24:12.220 | and machine learning to children,

01:24:13.280 | and I know you develop resources for that as well.

01:24:16.340 | So, I was wondering, like,

01:24:18.820 | where do you think it's problematic

01:24:20.700 | that we have computation that is very efficient,

01:24:23.020 | and can, you know, from a utilitarian

01:24:24.980 | and problem-solving perspective,

01:24:26.940 | it achieves all the goals,

01:24:28.220 | but we don't understand how it works,

01:24:30.300 | so we have to create these fake steps,

01:24:33.180 | and if you could think of scenarios

01:24:34.860 | where that could become very problematic over time,

01:24:37.180 | and why do we approach it in such a deterministic way?

01:24:40.740 | And when you mentioned that computation and intelligence

01:24:43.100 | are differentiated by this, like, very thin line,

01:24:46.500 | how does that affect the way you learn,

01:24:48.060 | and how do you think that will affect

01:24:49.180 | the way we kids learn, we learn?

01:24:51.580 | - Right, so I mean, look, my general principle about,

01:24:54.540 | you know, future generations and what they should learn,

01:24:57.860 | first point is, you know, very obvious point,

01:25:00.820 | that for every field that people study,

01:25:04.380 | you know, archeology to zoology,

01:25:06.700 | there either is now a computational X,

01:25:09.820 | or there will be soon.

01:25:11.260 | So, you know, every field, the paradigm of computation

01:25:15.100 | is becoming important,

01:25:17.300 | perhaps the dominant paradigm in that field.

01:25:19.620 | Okay, so how do you teach kids to be useful

01:25:22.620 | in a world where everything is computational?

01:25:25.900 | I think the number one thing is to teach them

01:25:29.820 | how to think in computational terms.

01:25:32.500 | What does that mean?

01:25:33.340 | It doesn't mean writing code, necessarily.

01:25:36.340 | I mean, in other words, one of the things

01:25:38.220 | that's happening right now as a practical matter

01:25:40.500 | is, you know, there've been these waves of enthusiasm

01:25:42.500 | for teaching coding of various kinds.

01:25:44.580 | You know, we're in a, actually we're in the end

01:25:46.780 | of an uptick wave, I think.

01:25:49.060 | It's going down again.

01:25:51.100 | You know, it's been up and down for 40 years or so.

01:25:54.860 | Okay, why doesn't that work?

01:25:56.820 | Well, it doesn't work because while there are people,

01:25:58.980 | like people who are students at MIT, for example,

01:26:01.700 | for whom they really want to learn, you know,

01:26:03.900 | engineering style coding,

01:26:05.820 | and it really makes sense for them to learn that,

01:26:08.380 | the vast majority of people,

01:26:09.860 | it's just not going to be relevant

01:26:11.860 | because they're not going to write

01:26:12.980 | a low-level C program or something.

01:26:15.340 | And it's the same thing that's happened in math education,

01:26:18.100 | which has been sort of a disaster there,

01:26:20.140 | which is the number one takeaway for most people

01:26:23.300 | from the math they learn in school is, I don't like math.

01:26:27.380 | And, you know, that's not for all of them, obviously,

01:26:30.500 | but that's the, you know, if you ask on a general scale,

01:26:33.740 | you know, what people, and why is that?

01:26:35.660 | Well, part of the reason is because what's been taught

01:26:37.620 | is rather low-level and mechanical.

01:26:39.700 | It's not about mathematical thinking, particularly.

01:26:43.020 | It's mostly about, you know, what teachers can teach

01:26:45.500 | and what assessment processes can assess and so on.

01:26:47.980 | Okay, so how should one teach computational thinking?

01:26:51.100 | I mean, I'm kind of excited about what we can do

01:26:53.620 | with Wolfram Language because I think we have

01:26:55.540 | a high enough level language that people can actually write,

01:27:00.180 | you know, that, for example, I reckon by age 11 or 12,

01:27:04.420 | and I've done many experiments on this,

01:27:05.940 | so I have some, the only problem with my experiments

01:27:08.700 | is most of my experiments end up being with kids

01:27:10.780 | who are high-achieving kids.

01:27:13.020 | Despite many efforts to reach lower-achieving kids,

01:27:15.500 | it always ends up that the kids who actually do the things

01:27:18.620 | that I set up are the high-achieving kids.

01:27:20.620 | But, you know, setting that aside,

01:27:23.020 | you know, you take the typical, you know,

01:27:27.420 | 11, 12, 13-year-olds and so on,

01:27:29.620 | and they can learn how to write stuff in this language,

01:27:33.140 | and what's interesting is they learn to start thinking,

01:27:36.060 | here, I'll show you, let's be very practical.

01:27:37.740 | I can show you, I was doing, every Sunday,

01:27:39.780 | I do a little thing with some middle school kids,

01:27:42.860 | and I might even be able to find my stuff from yesterday.

01:27:45.820 | This is, okay, let's see.

01:27:48.620 | Programming Adventures, January 28th.

01:27:52.300 | Okay, let's see what I did.

01:27:53.740 | Oh, look at that.

01:27:54.580 | That was why I thought of the South America thing here,

01:27:56.860 | because I'd just done that with these kids.

01:27:59.020 | And so, what are we doing?

01:28:03.740 | We were trying to figure out this,

01:28:06.820 | trying to figure out the shortest tour thing

01:28:10.900 | that I just showed you, which is,

01:28:12.820 | this is where I got what to show you,

01:28:14.820 | is what I was doing with these kids.

01:28:16.740 | But this was my version of this,

01:28:19.060 | but the kids all had various different versions of this,

01:28:21.980 | and we had somebody suggested, let's just enumerate,

01:28:26.980 | let's just look at all possible permutations

01:28:29.620 | of these cities and figure out what their distances are.

01:28:33.540 | There's the histogram of those.

01:28:35.380 | That's what we get from those.

01:28:36.860 | Okay, how do you get the largest distance from those,

01:28:39.660 | et cetera, et cetera, et cetera.

01:28:40.940 | And this is, okay, this was my version of it,

01:28:42.820 | but the kids had similar stuff.

01:28:44.980 | And this is, I think, and it probably went off into,

01:28:49.740 | oh yeah, there we go, there's the one for the whole Earth,

01:28:52.860 | and then they wanted to know, how do you do that in 3D?

01:28:55.900 | So I was showing them how to convert

01:28:58.540 | to XYZ coordinates in 3D

01:29:01.100 | and make the corresponding thing in 3D.

01:29:03.260 | So what's, this maybe isn't,

01:29:07.060 | this is a random example from yesterday,

01:29:08.700 | so it's not a highly considered example,

01:29:11.340 | but what I think is interesting is that

01:29:15.780 | we seem to have finally reached the point

01:29:17.500 | where we've automated enough

01:29:19.300 | of the actual doing of the computation

01:29:21.980 | that the kids can be exposed mostly

01:29:24.980 | to the thinking about what you might want to compute.

01:29:27.860 | And part of our role in language design,

01:29:30.860 | as far as I'm concerned,

01:29:32.140 | is to get it as much as possible

01:29:34.020 | to the point where, for example,

01:29:35.340 | you can do a bunch of natural language input,

01:29:37.420 | you can do things which make it as easy as possible

01:29:40.900 | for kids to not get mixed up in the kind of what the,

01:29:44.460 | how the computation gets done,

01:29:45.900 | but rather to just think about

01:29:47.100 | how you formulate the computation.

01:29:48.460 | So for example, a typical example I've used a bunch of times

01:29:51.380 | in what does it mean to write code versus do other things?

01:29:55.500 | Like a typical sort of test example would be,

01:29:58.540 | I don't know, you ask somebody,

01:30:00.620 | you're gonna, there's a practical problem

01:30:01.940 | we had in Wolf Malfoy,

01:30:02.780 | you give a lat-long position on the Earth,

01:30:05.100 | and you say, you're gonna make a map

01:30:07.020 | of that lat-long position.

01:30:08.940 | What scale of map should you make?

01:30:11.220 | Right, so if the lat-long is in the middle of the Pacific,

01:30:13.540 | making a 10-mile radius map isn't very interesting.

01:30:17.580 | If it's in the middle of Manhattan,

01:30:18.820 | a 10-mile radius map might be quite a sensible thing to do.

01:30:22.180 | So the question is, come up with an algorithm,

01:30:23.820 | come up with even a way of thinking about that question.

01:30:26.460 | What do you do?

01:30:27.780 | How should you figure that out?

01:30:29.220 | Well, you might say,

01:30:30.860 | oh, let's look at the visual complexity of the image.

01:30:33.500 | Let's look at how far it is to another city.

01:30:37.020 | That's far, you know, there are various different things,

01:30:38.940 | but thinking about that

01:30:40.420 | as a kind of computational thinking exercise

01:30:43.140 | that is, you know, that's the kind of thing.

01:30:48.140 | So in terms of what one automates

01:30:50.340 | and whether people need to understand how it works inside,

01:30:53.740 | okay, main point is you'll,

01:30:59.980 | in the end, it will not be possible

01:31:01.740 | to know how it works inside.

01:31:03.540 | So you might as well stop having that be a criterion.

01:31:06.260 | I mean, that is, there are plenty of things

01:31:07.940 | that one teaches people that are,

01:31:09.820 | let's say in lots of areas of biology, medicine,

01:31:14.460 | whatever else, you know,

01:31:16.260 | maybe we'll know how it works inside one day,

01:31:18.580 | but you can still, there's an awful lot of useful stuff

01:31:20.660 | you can teach without knowing how it works inside.

01:31:23.260 | And I think also, as we get computation

01:31:25.620 | to be more efficient, inevitably,

01:31:27.020 | we will be dealing with things

01:31:28.060 | where you don't know how it works inside.

01:31:29.580 | Now, you know, we've seen this in math education

01:31:31.460 | 'cause I've happened to make tools

01:31:33.540 | that automate a bunch of things

01:31:34.980 | that people do in math education.

01:31:37.100 | And I think, well, to tell a silly story,

01:31:40.460 | I mean, my older daughter,

01:31:42.540 | who at some point in the past was doing calculus,

01:31:45.540 | you know, and learning doing integrals and things,

01:31:47.220 | and I was saying to her, you know,

01:31:49.300 | I didn't think humans still did that stuff anymore,

01:31:53.500 | which was a very unendearing comment.

01:31:55.420 | But in any case, I mean, you know,

01:31:59.140 | there's a question of whether do humans need

01:32:01.580 | to know how to do that stuff or not?

01:32:03.700 | So I haven't done an integral by hand in probably 35 years.

01:32:07.300 | That true?

01:32:09.060 | More or less true.

01:32:10.540 | But when I was using computers to do them,

01:32:13.460 | I was for a while, you know,

01:32:16.380 | when I used to do physics and so on,

01:32:17.740 | I used computers to do this stuff,

01:32:19.340 | I was a really, really good integrator,

01:32:22.420 | except that it wasn't really me,

01:32:24.380 | it was me plus the computer.

01:32:25.940 | So how did that come to be?

01:32:27.020 | Well, the answer was that because I was doing things

01:32:29.580 | by computer, I was able to try zillions of examples,

01:32:33.020 | and I got a much better intuition than most people got

01:32:36.020 | for how these things would work roughly,

01:32:37.940 | how what you did to make the thing go and so on.

01:32:41.220 | Whereas people who are like,

01:32:42.220 | I'm just working this one thing out by hand,

01:32:44.660 | you get a different, you know,

01:32:45.860 | you don't get that intuition.

01:32:47.140 | So I think, you know, two points.

01:32:48.940 | First of all, you know, this,

01:32:51.060 | how do you think about things computationally?

01:32:52.980 | How do you formulate the question computationally?

01:32:55.140 | That's really important and something that we are now

01:32:57.700 | in a position, I think, to actually teach.

01:32:59.860 | And it is not really something you teach by,

01:33:02.620 | you know, teaching, you know, traditional quotes coding,

01:33:06.020 | because a lot of that is, okay, we're gonna make a loop,

01:33:08.420 | we're gonna define variables.

01:33:10.140 | I just as a, I think I probably have a copy here, yeah.

01:33:13.300 | I wrote this book for,

01:33:14.780 | this is a book kind of for kids about open language,

01:33:17.220 | except it seems to be useful to adults as well,

01:33:19.500 | but I wrote it for kids.

01:33:20.940 | So it's, one of the amusing things in this book

01:33:24.820 | is it doesn't talk about assigning values to variables

01:33:28.620 | until chapter 38.

01:33:30.820 | So in other words,

01:33:31.860 | that will be a thing that you would find in chapter one

01:33:34.260 | of most, you know, low level programming,

01:33:37.220 | coding type things.

01:33:38.660 | It turns out it's not that relevant to know how to do that.

01:33:41.140 | It's also kind of confusing and not necessary.

01:33:45.380 | And so, you know, in terms of the,

01:33:48.260 | you asked where will we get in trouble

01:33:49.540 | when people don't know how the stuff works inside.

01:33:52.060 | That's, I mean, you know,

01:33:55.380 | I think one just has to get used to that

01:33:57.060 | because it's like, you know, you might say,

01:33:59.340 | well, we live in the world

01:34:00.500 | and it's full of natural processes

01:34:02.300 | where we don't know how they work inside,

01:34:04.100 | but somehow we managed to survive

01:34:06.220 | and we go to a lot of effort to do natural science

01:34:08.900 | to try and figure out how stuff works inside.

01:34:11.580 | But it turns out we can still use plenty of things

01:34:13.820 | even when we don't know how they work inside.

01:34:15.700 | We don't need to know.

01:34:17.620 | And I think the, I mean, I think the main point is

01:34:20.740 | computational irreducibility guarantees

01:34:22.860 | that we will be using things where we don't know

01:34:25.140 | and can't know how they work inside.

01:34:27.900 | And, you know, I think the perhaps,

01:34:30.900 | the thing that is a little bit, you know,

01:34:34.340 | to me a little bit unfortunate as a, you know,

01:34:37.980 | as a typical human type thing,

01:34:40.620 | the fact that I can readily see that, you know,

01:34:43.380 | the AI stuff we build is sort of effectively

01:34:47.300 | creating languages and things

01:34:49.420 | that are completely outside our domain to understand.

01:34:52.260 | And where, by that I mean, you know,

01:34:55.020 | our human language with its 50,000 words or whatever

01:34:57.540 | has been developed over the last however many,

01:34:59.620 | you know, tens of thousands of years.

01:35:01.500 | And as a society, we've developed this way

01:35:03.940 | of communicating and explaining things.

01:35:05.980 | You know, the AIs are reproducing that process very quickly,

01:35:10.980 | but they're coming up with a, an ahistorical,

01:35:14.100 | you know, something, you know,

01:35:15.300 | their way of describing the world,

01:35:16.700 | but it doesn't happen to relate at all

01:35:18.180 | to our historical way of doing it.

01:35:20.100 | And that's, you know, it's a little bit disquieting

01:35:23.140 | to me as a human that, you know,

01:35:24.740 | there are things going on inside where I know it is,

01:35:26.980 | you know, in principle, I could learn that language,

01:35:29.900 | but it's, you know, not the historical one

01:35:33.420 | that we've all learned.

01:35:34.740 | And it really wouldn't make a lot of sense to do that

01:35:36.580 | 'cause you learn it for one AI

01:35:37.820 | and then another one gets trained

01:35:39.300 | and it's gonna use something different.

01:35:41.420 | So it's, but my main, I guess my main point for education,

01:35:45.820 | so another point about education I'll just make,

01:35:47.660 | which is something I haven't figured out,

01:35:48.940 | but just is, you know, when do we get to make a good model

01:35:54.580 | for the human learner using machine learning?

01:35:57.700 | So in other words, you know,

01:35:59.380 | part of what we're trying to do,

01:36:00.540 | like I've got that automated proof,

01:36:02.980 | I would really like to manage to figure out a way,

01:36:05.860 | what is the best way to present that proof

01:36:07.940 | so a human can understand it?

01:36:09.740 | And basically for that,

01:36:11.820 | we have to have a bunch of heuristics

01:36:13.460 | about how humans understand things.

01:36:14.900 | So as an example, if we're doing, let's say,

01:36:17.540 | a lot of visualization stuff in Wolfram Language, okay,

01:36:20.180 | we have tried to automate, do automated aesthetics.

01:36:23.820 | So what we're doing is, you know, we're laying out a graph,

01:36:27.500 | what way of laying out that graph

01:36:29.260 | is most likely for humans to understand it?

01:36:31.580 | And we've done that, you know,

01:36:32.580 | by building a bunch of heuristics and so on,

01:36:34.740 | but that's an example of, you know,

01:36:36.420 | if we could do that for learning,

01:36:38.500 | and we say, what's the optimal path,

01:36:40.180 | given that the person is trying to understand this proof,

01:36:42.300 | for example, what's the optimal path to lead them through

01:36:45.380 | understanding that proof?

01:36:46.780 | I suspect we will learn a lot more

01:36:48.500 | in probably fairly small number of years about that.

01:36:51.980 | And it will be the case that, you know,

01:36:53.620 | for example, if you've got, oh, I don't know,

01:36:56.740 | you can do simple things like, you know,

01:36:58.700 | you go to Wikipedia and you look at what the path of,

01:37:01.460 | you know, how do you, if you wanna learn this concept,

01:37:03.180 | what other concepts do you have to learn?

01:37:04.740 | We have much more detailed symbolic information

01:37:07.060 | about what is actually necessary to know

01:37:09.780 | in order to understand this and so on.

01:37:12.300 | It is, I think, reasonably likely

01:37:14.780 | that we will be able to, I mean, you know,

01:37:16.860 | if I look at, I was interested recently

01:37:18.700 | in the history of math education.

01:37:20.060 | So I wanted to look at the complete sort of path

01:37:22.980 | of math textbooks, you know, for the past,

01:37:26.100 | well, basically the, like, 1,200, you know,

01:37:29.900 | Pivarnacci produced this, one of the early math textbooks.

01:37:33.060 | So there've been these different ways of teaching math.

01:37:35.660 | And, you know, I think we've gradually evolved

01:37:38.500 | a fairly optimized way for the typical person,

01:37:41.500 | though it's probably the variation of the population

01:37:43.940 | is not well understood, for, you know,

01:37:45.860 | how to explain certain concepts.

01:37:47.900 | And we've gone through some pretty weird ways of doing it

01:37:50.460 | from the 1600s and so on,

01:37:52.380 | where which have gone out of style and possibly,

01:37:55.620 | you know, who knows whether that's because of, well,

01:37:58.860 | but anyway, so, you know, we've kind of learned this path

01:38:01.580 | of what's the optimal way to explain adding fractions

01:38:04.260 | or something for humans, for the typical human.

01:38:07.420 | But I think we'll learn a lot more about how, you know,

01:38:09.700 | by essentially making a model for the human,

01:38:11.980 | a machine model for the human,

01:38:13.500 | we'll learn more about how to, you know,

01:38:16.340 | how to optimize, how to explain stuff to humans,

01:38:19.260 | a coming attraction.

01:38:20.740 | But-- - Thanks.

01:38:22.180 | - By the way, do you think we're close to that at all?

01:38:24.580 | 'Cause you said that there's something in Wolfram Alpha

01:38:28.300 | that presents the human a nice way.

01:38:31.740 | Are we how far, you said, coming attraction 10 years?

01:38:34.540 | - Yeah, right, so I mean, in that explaining stuff

01:38:39.340 | to humans thing is a lot of human work right now.

01:38:43.340 | Being able to automate explaining stuff to humans.

01:38:46.740 | Okay, so some of these things, let's see.

01:38:50.980 | I mean, so an interesting question,

01:38:53.740 | actually just today I was working on something

01:38:55.380 | that's related to this.

01:38:56.660 | Yeah, it's being able to,

01:38:58.980 | the question is given a whole bunch of,

01:39:02.100 | can we, for example, train a machine learning system

01:39:05.420 | from explanations that it can see, roughly,

01:39:08.540 | can we train it to give explanations

01:39:10.340 | that are likely to be understandable?

01:39:12.180 | Maybe.

01:39:13.340 | I think the, okay, so an example that I'd like to do,

01:39:16.900 | okay, I'd like to do a debugging assistant

01:39:19.540 | where the typical thing is program runs,

01:39:21.940 | program gives wrong answer.

01:39:23.660 | Human says, why did you get the wrong,

01:39:25.580 | why did it give the wrong answer?

01:39:27.180 | Well, the first piece of information to the computer is

01:39:29.580 | that was, the human thought that was the wrong answer

01:39:32.300 | 'cause the computer just did what it was told

01:39:34.460 | and it didn't know that was supposed to be the wrong answer.

01:39:36.500 | So then the question is, can you in fact,

01:39:39.140 | in that domain, can you actually have

01:39:41.580 | a reasonable conversation in which the human

01:39:44.980 | is explaining the computer what they thought

01:39:46.540 | it was supposed to do, the computer is explaining

01:39:48.580 | what happened and why did it happen and so on.

01:39:50.900 | Same kind of thing for math tutoring.

01:39:53.700 | You know, we have a lot of, you know,

01:39:55.500 | we've got a lot of stuff, you know,

01:39:57.300 | we're sort of very widely used for people

01:39:59.380 | who want to understand the steps in math.

01:40:02.220 | You know, can we make a thing where people tell us,

01:40:04.420 | I think it's this?

01:40:05.780 | Okay, I'll tell you one little factoid,

01:40:07.460 | which I, which it did work out.

01:40:08.940 | So if you do multi-digit arithmetic,

01:40:11.540 | multi-digit addition, okay?

01:40:13.820 | Okay, so the basis of this is,

01:40:15.980 | it's kind of silly, silly thing,

01:40:18.340 | but you know, if you get the right answer

01:40:20.100 | for an addition sum, okay,

01:40:22.140 | you don't get very much information.

01:40:23.900 | The student gives the wrong answer,

01:40:25.980 | the question is, can you tell them where they went wrong?

01:40:28.860 | So let's say you have a four-digit addition sum

01:40:31.340 | and the student gives the wrong answer.

01:40:33.260 | Can you backtrace and figure out what they likely did wrong?

01:40:35.980 | And the answer is you can.

01:40:37.620 | You just make this graph of all the different things

01:40:40.260 | that can happen, you know, when did they,

01:40:42.540 | you know, there's certain things that are more common,

01:40:44.500 | transposing numbers and things,

01:40:45.900 | or you know, having a one and a seven mixed up,

01:40:48.660 | those kinds of things.

01:40:49.700 | You can, with very high probability,

01:40:51.860 | given a four-digit addition sum with the wrong answer,

01:40:54.900 | you can say this is the mistake you made,

01:40:57.380 | which is sort of interesting.

01:40:58.980 | And that's, you know, being done in a fairly symbolic way,

01:41:02.340 | whether one can do that in a, you know,

01:41:05.020 | more machine learning kind of way

01:41:06.660 | for more complicated derivations, I'm not sure.

01:41:09.340 | But that's a, you know, that's one that works.

01:41:14.020 | - Hi, sir, I just had a follow-up question.

01:41:16.940 | So do you think, you know, like in the future,

01:41:20.060 | is it possible to simulate virtual environments

01:41:23.780 | which can actually understand how the human mind works

01:41:27.380 | and then build, you know, like finite state machines

01:41:30.260 | inside of this virtual environment

01:41:32.060 | to provide a better learning experience

01:41:36.060 | and a more personalized learning experience?

01:41:39.140 | - Well, I mean, so the question is,

01:41:41.140 | if you're going to, you know,

01:41:43.740 | can you optimize, if you're playing a video game

01:41:45.780 | or something and that video game

01:41:46.900 | is supposed to be educational,

01:41:48.500 | can you optimize the experience

01:41:51.220 | based on a model of you, so to speak?

01:41:54.260 | Yeah, I'm sure the answer is yes.

01:41:56.060 | And I'm sure the, you know,

01:41:57.380 | the question of how complicated the model of you will be

01:42:00.460 | is an interesting question.

01:42:01.700 | I don't know the answer to.

01:42:02.660 | I mean, I've kind of wondered a similar question.

01:42:04.740 | So I'm a kind of personal analytics enthusiast,

01:42:07.980 | so I collect tons of data about myself.

01:42:10.300 | And I mean, I do it mostly

01:42:11.980 | 'cause it's been super easy to do

01:42:13.340 | and I've done it for like 30 years.

01:42:15.260 | And I have, you know,

01:42:16.100 | every keystroke I've typed on a computer,

01:42:17.740 | like every keystroke I've typed here.

01:42:19.260 | And I, the screen of my computer,

01:42:21.060 | every 30 seconds or so, maybe 15 seconds,

01:42:24.220 | I'm not sure, there's a screenshot.

01:42:26.380 | It's a super boring movie to watch.

01:42:28.100 | But anyway, I've been collecting all this stuff.

01:42:30.540 | And so the question that I've asked is,

01:42:33.020 | is there enough data that a bot of me could be made?

01:42:36.980 | In other words, do I have enough data about,

01:42:39.860 | you know, I've got, I've written a million emails,

01:42:43.540 | I have all of those,

01:42:44.380 | I've received 3 million emails over that period of time.

01:42:48.540 | I've got, you know, endless, you know,

01:42:50.820 | things I've typed, et cetera, et cetera, et cetera.

01:42:53.460 | You know, is there enough data to reconstruct,

01:42:56.140 | you know, me basically?

01:42:59.620 | I think the answer is probably yes.

01:43:01.860 | Not sure, but I think the answer is probably yes.

01:43:04.260 | And so the question is in an environment

01:43:06.620 | where you're interacting with some video game,

01:43:08.380 | trying to learn something, whatever else,

01:43:10.220 | you know, how long is it going to be

01:43:11.540 | before it can learn enough about you

01:43:13.780 | to change that environment in a way that's useful

01:43:15.860 | for explaining the next thing to you?

01:43:18.100 | I would guess, I would guess that if done,

01:43:21.100 | that this is comparatively easy.

01:43:23.500 | I might be wrong, but, and that the,

01:43:26.540 | I mean, I think, you know, it's an interesting thing

01:43:29.060 | because, you know, one's dealing with, you know,

01:43:31.220 | there's a space of human personalities,

01:43:32.980 | there's a space of human learning styles.

01:43:35.420 | You know, I'm sort of always interested

01:43:37.060 | in the space of all possible XYZ.

01:43:39.900 | And there's, you know, there's that question

01:43:41.780 | of how do you parameterize the space

01:43:43.420 | of all possible human learning styles?

01:43:45.740 | And is there a way that we will learn, you know,

01:43:49.060 | like, can we do that symbolically

01:43:52.140 | and say these are 10 learning styles, or is it something,

01:43:54.660 | I think that's a case where it's better to use, you know,

01:43:58.220 | sort of soft machine learning type methods

01:44:02.220 | to kind of feel out that space.

01:44:04.420 | - Thank you.

01:44:07.180 | - Yeah, maybe, very last question.

01:44:10.660 | - I was just intuitively thinking

01:44:13.140 | when you spoke about an ocean,

01:44:14.540 | I thought of Isaac Newton when he said,

01:44:17.500 | you know, the famous quote, "I might not."

01:44:22.340 | And I thought instead of Newton on the beach,

01:44:25.020 | what if Franz Liszt were there?

01:44:27.540 | What question would he ask?

01:44:29.700 | What would he say?

01:44:30.700 | And I'm trying to understand your,

01:44:34.540 | the alien ocean and humans

01:44:39.020 | through maybe Franz Liszt and music.

01:44:41.340 | - Well, so, I mean, the quote from Newton is,

01:44:45.820 | it's sort of an interesting quote.

01:44:48.380 | I think it goes something like this.

01:44:49.620 | If, you know,

01:44:50.460 | people are talking about how wonderful calculus

01:44:55.380 | and all that kind of thing are,

01:44:56.980 | and Newton says, you know,

01:44:59.820 | "To others, I may seem like I've done a lot of stuff,

01:45:01.860 | but to me, I seem like a child

01:45:03.500 | who picked up a particularly elegant seashell on the beach.

01:45:08.500 | And I've been studying this seashell for a while,

01:45:11.740 | even though there's this ocean of truth out there

01:45:13.780 | waiting to be discovered."

01:45:14.660 | That's roughly the quote, okay?

01:45:17.020 | I find that quote interesting for the following reason.

01:45:20.700 | What Newton did was, you know, calculus and things like it,

01:45:25.340 | if you look at the computational universe

01:45:26.940 | of all possible programs, there is a small corner.

01:45:30.100 | Newton was exactly right in what he said.

01:45:32.060 | That is, he picked off calculus,

01:45:34.660 | which is a corner of the possible things

01:45:36.620 | that can happen in the computational universe

01:45:39.260 | that happened to be an elegant seashell, so to speak.

01:45:42.220 | They happened to be a case where you can figure out

01:45:44.380 | what's going on and so on,

01:45:46.580 | while there is still this sort of ocean

01:45:48.580 | of other sort of computational possibilities out there.

01:45:53.140 | But when it comes to, you know, you're asking about music,

01:45:55.620 | I, oh, I think my computer stopped

01:45:57.500 | being able to get anywhere,

01:45:58.580 | but sort of interesting, the,

01:46:02.020 | see if we can get to the site.

01:46:04.660 | Yeah, so this is a website that we made years ago,

01:46:09.660 | and now my computer isn't playing anything, but.

01:46:14.300 | (upbeat music)

01:46:16.900 | Let's try that.

01:46:20.500 | Okay, so these things are created

01:46:26.300 | by basically just searching computational universe

01:46:29.260 | of possible programs.

01:46:30.780 | It's sort of interesting because every one

01:46:32.980 | has kind of a story.

01:46:34.300 | Some of them look more interesting than others.

01:46:35.820 | Let's try that one.

01:46:36.780 | Anyway, the, what's interesting,

01:46:49.060 | actually, what was interesting to me about this was,

01:46:51.460 | this is a very trivial, you know,

01:46:53.340 | what this is doing is very trivial at some level.

01:46:55.780 | It's just, it actually happens to use cellular automata.

01:46:58.820 | You can even have it show you, I think, someplace here.

01:47:02.180 | Where is it?

01:47:03.020 | Somewhere there's a way of showing,

01:47:03.900 | you know, show the evolution.

01:47:05.020 | This is showing the behind the scenes

01:47:07.620 | what was actually happening,

01:47:09.660 | what it chose to use to generate that musical piece.

01:47:12.420 | And what I thought was interesting about this site,

01:47:16.940 | I thought, well, you know,

01:47:19.660 | how would computers be relevant to music,

01:47:21.740 | et cetera, et cetera, et cetera?

01:47:22.580 | Well, you know, what would happen is,

01:47:24.620 | a human would have an idea,

01:47:26.380 | and then the computer would kind of dress up that idea.

01:47:29.140 | And then, you know, a bunch of years go by,

01:47:31.380 | and I talk to people, you know,

01:47:33.580 | who are composers and things, and they say,

01:47:35.060 | "Oh yeah, I really like that Wolfram Tone site."

01:47:37.780 | Okay, that's nice.

01:47:39.300 | They say, "It's a very good place for me to get ideas."

01:47:42.740 | So that's sort of the opposite of what I would have expected,

01:47:46.420 | namely, what's happening is, you know,

01:47:49.140 | human comes here, you know,

01:47:51.100 | listens to some 10 second fragment,

01:47:54.740 | and they say, "Oh, that's an interesting idea."

01:47:56.940 | And then they kind of embellish it

01:47:59.260 | using kind of something that is humanly meaningful.

01:48:02.220 | But it's like, you know, you're taking a photograph,

01:48:04.740 | and you see some interesting configuration,

01:48:07.100 | and then kind of you're, you know,

01:48:08.900 | you're filling that with kind of some human sort of context.

01:48:13.660 | But so I'm not quite sure what,

01:48:20.700 | what you were asking about.

01:48:21.740 | I mean, back to the Newton quote,

01:48:24.060 | the thing that I think is another way

01:48:27.100 | to think about that quote is us humans, you know,

01:48:31.020 | with our sort of historical development of, you know,

01:48:34.900 | our intellectual history have explored

01:48:38.140 | this very small corner of what's possible

01:48:40.740 | in the computational universe.

01:48:42.460 | And everything that we care about

01:48:44.580 | is contained in the small corner.

01:48:46.740 | And that means that, you know, you could say,

01:48:49.260 | "Well, gee, you know, I want to, you know,

01:48:52.620 | what we end up wanting to talk about

01:48:56.420 | are the things that we as a society

01:48:58.180 | have decided we care about."

01:49:00.020 | And what, there's an interesting feedback loop,

01:49:02.100 | I'll just mention, it should end,

01:49:03.460 | but so you might say, so here's a funny thing.

01:49:08.420 | So let's take language, for example.

01:49:10.580 | Language evolves, we say, we make up language

01:49:13.540 | to describe what we see in the world, okay?

01:49:16.260 | Fine, that's a fine idea.

01:49:18.020 | Imagine the, you know, in Paleolithic times,

01:49:20.580 | people would make up language.

01:49:21.820 | They probably didn't have a word for table

01:49:24.020 | because they didn't have any tables.

01:49:26.300 | They probably had a word for rock.

01:49:28.300 | But then we end up as a result of the particular,

01:49:32.700 | you know, development that our civilization

01:49:35.780 | has gone through, we build tables.

01:49:38.100 | And there was sort of a synergy

01:49:40.540 | between coming up with a word for table

01:49:42.860 | and deciding tables were a thing

01:49:44.420 | and we should build a bunch of them.

01:49:46.220 | And so there's this sort of complicated interplay

01:49:48.500 | between the things that we learn how to describe

01:49:51.060 | and how to think about, the things that we build

01:49:53.340 | and put in our environment, and then the things

01:49:55.740 | that we end up wanting to talk about

01:49:58.700 | because they're things that we have experience of

01:50:00.780 | in our environment.

01:50:01.940 | And so that's the, you know, I think as we look

01:50:03.780 | at sort of the progress of civilization,

01:50:05.740 | there's, you know, there's various layers of,

01:50:08.060 | first we, you know, we invent a thing

01:50:10.580 | that we can then think about and talk about.

01:50:13.260 | Then we build an environment based on that.

01:50:16.580 | Then that allows us to do more stuff

01:50:18.780 | and we build on top of that.

01:50:19.780 | And that's why, for example, when we talk

01:50:21.180 | about computational thinking and teaching it to kids

01:50:23.140 | and so on, that's one reason that's kind of important

01:50:26.260 | because we're building a layer of things

01:50:28.780 | that people are then familiar with that's different

01:50:30.820 | from what we've had so far.

01:50:32.500 | And they give people a way to talk about things.

01:50:33.980 | I'll give you one example that,

01:50:36.180 | let's see, did I have that still up?

01:50:37.980 | The, yeah, okay, one example here.

01:50:42.580 | (keyboard clicking)

01:50:45.580 | From this blog post of mine, actually.

01:50:49.020 | So the, where is it?

01:50:53.660 | Okay, so that thing there is a nested pattern.

01:50:58.660 | You know, it's a Sapinski.

01:51:01.220 | That tile pattern was created in 1210 AD, okay?

01:51:10.180 | And it's the first example I know of a fractal pattern.

01:51:13.220 | Okay, well, the art historians wrote about these patterns.

01:51:18.220 | There are a bunch of this particular style of pattern.

01:51:20.620 | They wrote about these for years.

01:51:22.620 | They never discussed that nested pattern.

01:51:24.860 | These patterns also have, you know, pictures of lions

01:51:27.100 | and, you know, elephants and things like that in them.

01:51:29.860 | They wrote about those kinds of things,

01:51:31.620 | but they never mentioned the nested pattern

01:51:33.820 | until basically about 25 years ago

01:51:37.620 | when fractals and so on became a thing.

01:51:40.660 | And then it's, ah, I can now talk about that.

01:51:42.940 | It's a nested pattern, it's a fractal.

01:51:44.980 | And then, you know, before that time,

01:51:47.100 | the art historians were blind

01:51:49.020 | to that particular part of this pattern.

01:51:50.980 | It was just like, I don't know what that is.

01:51:52.700 | But there's no, you know, I don't have a word to describe it.

01:51:54.820 | I'm not going to, I'm not gonna talk about it.

01:51:58.780 | So that's a, you know, it's part of this feedback loop

01:52:00.700 | of things that we learn to describe,

01:52:04.100 | then we build in terms of those things,

01:52:05.780 | then we build another layer.

01:52:07.380 | I think one of the things, I mean, you talk about,

01:52:09.620 | you know, just in the sort of,

01:52:11.860 | the thing, one thing I'm really interested in

01:52:14.620 | is the evolution of purposes.

01:52:16.500 | So, you know, if you look back in human history,

01:52:18.740 | there's a, you know, what was thought to be worth doing

01:52:21.740 | a thousand years ago is different

01:52:22.980 | from what's thought to be worth doing today.

01:52:25.460 | And part of that is, you know,

01:52:29.420 | good examples of things like, you know,

01:52:30.900 | walking on a treadmill or buying goods in virtual worlds.

01:52:35.220 | Both of these are hard to explain

01:52:36.900 | to somebody from a thousand years ago,

01:52:39.060 | because each one ends up being a whole sort of societal story

01:52:42.260 | about we're doing this because we do that,

01:52:43.740 | because we do that.

01:52:44.820 | And so the question is,

01:52:45.660 | how will these purposes evolve in the future?

01:52:48.460 | And I think one of the things that I view

01:52:50.100 | as a sort of sobering thought is that,

01:52:52.260 | actually one thing I found rather disappointing

01:52:56.740 | and then I became less pessimistic about it is,

01:52:59.100 | if you think about the future of the human condition,

01:53:01.340 | and, you know, we've been successful

01:53:02.780 | in making our AI systems and we can read out brains

01:53:05.540 | and we can upload consciousnesses and things like that.

01:53:07.980 | And we've eventually got this box

01:53:09.660 | with trillions of souls in it.

01:53:11.620 | And the question is, what are these souls doing?

01:53:13.900 | And to us today, it looks like they're playing video games

01:53:17.700 | for the rest of eternity, right?

01:53:19.540 | And that seems like a kind of a bad outcome.

01:53:21.700 | It's like, we've gone through all of this long history

01:53:24.380 | and what do we end up with?

01:53:25.260 | We end up with a trillion souls

01:53:27.020 | in a box playing video games, okay?

01:53:29.420 | And I thought this is a very, you know,

01:53:32.300 | depressing outcome, so to speak.

01:53:34.540 | And then I realized that actually, you know,

01:53:36.940 | if you look at the sort of arc of human history,

01:53:39.780 | people at any given time in history,

01:53:43.100 | people have been, you know, they've,

01:53:46.820 | my main conclusion is that any time in history,

01:53:51.460 | the things people do seem meaningful and purposeful

01:53:54.700 | to them at that time in history and history moves on.

01:53:59.100 | And, you know, like a thousand years ago,

01:54:00.980 | there were a lot of purposes that people had

01:54:04.460 | that, you know, were to do with weird superstitions

01:54:07.260 | and things like that that we say,

01:54:08.780 | why the heck were you doing that?

01:54:10.100 | That just doesn't make any sense, right?

01:54:12.420 | But to them at that time, it made all the sense in the world.

01:54:15.820 | And I think that, you know,

01:54:16.940 | the thing that makes me sort of less depressed

01:54:18.740 | about the future, so to speak,

01:54:20.420 | is that at any given time in history, you know,

01:54:23.580 | you can still have meaningful purposes,

01:54:26.420 | even though they may not look meaningful

01:54:28.100 | from a different point in history.

01:54:29.260 | And that there's sort of a whole theory

01:54:30.660 | you can kind of build up based on kind of the trajectories

01:54:33.900 | that you follow through the space of purposes

01:54:36.260 | and sort of interesting, if you can't jump, like you say,

01:54:39.460 | let's get cryonically frozen for, you know, 300 years,

01:54:43.020 | and then, you know, be back again.

01:54:46.100 | The interesting case is then, you know,

01:54:48.900 | all the purposes that you sort of, you know,

01:54:52.100 | that you find yourself in,

01:54:53.940 | ones that have any continuity with what we know today.

01:54:56.740 | I should stop with that.

01:54:57.580 | - That's a beautiful way to end it.

01:54:59.300 | Please give Steven a big hand.

01:55:00.940 | (audience applauding)

Stephen Wolfram: Computational Universe | MIT 6.S099: Artificial General Intelligence (AGI)

Chapters