MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)

00:00:00.000 | Today we have Josh Tenenbaum.

00:00:02.000 | He's a professor here at MIT,

00:00:04.320 | leading the computational cognitive science group.

00:00:07.040 | Among many other topics in cognition and intelligence,

00:00:11.000 | he is fascinated with the question of how human beings learn

00:00:14.280 | so much from so little.

00:00:16.280 | And how these insights can lead to build AI systems

00:00:19.680 | that are much more efficient learning from data.

00:00:22.000 | So please give Josh a warm welcome.

00:00:25.000 | [Applause]

00:00:31.000 | All right. Thank you very much.

00:00:33.000 | Thanks for having me.

00:00:35.000 | I'm excited to be part of what looks like

00:00:37.000 | really quite a very impressive lineup,

00:00:39.000 | especially starting after today.

00:00:41.000 | And it's, I think, quite a great opportunity

00:00:44.000 | to get to see perspectives on artificial intelligence

00:00:47.000 | from many of the leaders in industry

00:00:49.000 | and other entities working on this great quest.

00:00:53.000 | So I'm going to talk to you about some of the work

00:00:55.000 | that we do in our group, but also I'm going to try to give

00:00:58.000 | a broader perspective reflective of a number of MIT faculty,

00:01:01.000 | especially those who are affiliated with

00:01:03.000 | the Center for Brains, Minds, and Machines.

00:01:05.000 | So you can see up there on my affiliation.

00:01:07.000 | Academically, I'm part of Brain and Cognitive Science,

00:01:09.000 | or Course 9. I'm also part of CSAIL.

00:01:12.000 | But I'm also part of the Center for Brains, Minds, and Machines,

00:01:15.000 | which is an NSF-funded center, science and technology center,

00:01:18.000 | which really stands for the bridge between

00:01:20.000 | the science and the engineering of intelligence.

00:01:22.000 | It literally straddles Vassar Street

00:01:24.000 | in that we have CSAIL and BCS members.

00:01:26.000 | We also have partners at Harvard and other academic institutions.

00:01:29.000 | And again, what we stand for, I want to try to convey

00:01:31.000 | some of the specific things we're doing in the center

00:01:34.000 | and where we want to go with a vision that really is about

00:01:37.000 | jointly pursuing the science, the basic science

00:01:40.000 | of how intelligence arises in the human mind and brain,

00:01:43.000 | and also the engineering enterprise

00:01:46.000 | of how to build something increasingly like

00:01:48.000 | human intelligence in machines.

00:01:50.000 | And we deeply believe that these two projects

00:01:52.000 | have something to do with each other

00:01:54.000 | and are best pursued jointly.

00:01:56.000 | Now, it's a really exciting time to be doing anything

00:01:58.000 | related to intelligence or certainly to AI

00:02:00.000 | for all the reasons that, you know, brought you all here.

00:02:03.000 | I don't have to tell you this.

00:02:05.000 | We have all these ways in which AI is kind of finally here.

00:02:08.000 | We finally live in the era of something like real practical AI.

00:02:11.000 | Or for those who've been around for a while

00:02:14.000 | and have seen some of the rises and falls,

00:02:16.000 | you know, AI is back in a big way.

00:02:18.000 | But from my perspective, and I think maybe this reflects,

00:02:21.000 | you know, why we distinguish what we might call AGI from AI,

00:02:24.000 | we don't really have any real AI, basically.

00:02:27.000 | We have what I like to call AI technologies,

00:02:29.000 | which are systems that do things we used to think

00:02:32.000 | that only humans could do, and now we have machines

00:02:34.000 | that do them, often quite well, maybe even better

00:02:37.000 | than any human who's ever lived, right,

00:02:39.000 | like a machine that plays Go.

00:02:41.000 | But none of these systems, I would say,

00:02:43.000 | are truly intelligent.

00:02:45.000 | None of them have anything like common sense.

00:02:47.000 | They have nothing like the flexible, general-purpose

00:02:49.000 | intelligence that each of you might use

00:02:51.000 | to learn every one of these skills or tasks, right?

00:02:54.000 | Each of these systems had to be built

00:02:56.000 | by large teams of engineers,

00:02:58.000 | working together often for a number of years,

00:03:00.000 | often at great cost to somebody who's willing to pay for it.

00:03:03.000 | And each of them just does one thing.

00:03:05.000 | So AlphaGo might beat the world's best,

00:03:07.000 | but it can't drive to the match

00:03:09.000 | or even tell you what Go is.

00:03:12.000 | It can't even tell you that Go is a game

00:03:14.000 | because it doesn't even know what a game is, right?

00:03:16.000 | So what's missing?

00:03:18.000 | What is it that makes every one of your brains--

00:03:20.000 | maybe you can't beat the world's best in Go,

00:03:23.000 | but any one of you can get behind the wheel of a car.

00:03:26.000 | I think of this because my daughter

00:03:28.000 | is going to turn 16 tomorrow.

00:03:30.000 | If she lived in California, she'd have a driver's license.

00:03:33.000 | It's a little bit down the line for us here in Massachusetts.

00:03:36.000 | But she didn't have to be specially engineered

00:03:39.000 | by billion-dollar startups,

00:03:41.000 | and she got really into chess recently,

00:03:43.000 | and now she's taught herself chess

00:03:45.000 | by playing just a handful of games, basically.

00:03:48.000 | And she can do any one of these activities,

00:03:50.000 | and any one of us can.

00:03:52.000 | So what is it? What makes up the difference?

00:03:54.000 | Well, there's many things, right?

00:03:56.000 | I'll talk about the focus for us in our research,

00:04:00.000 | and a lot of us, again, in CBMM, is summarized here.

00:04:04.000 | What drives the successes right now in AI,

00:04:08.000 | especially in industry, okay,

00:04:10.000 | in all these AI technologies, is many, many things, many things.

00:04:13.000 | But where the progress has been made most recently

00:04:16.000 | and what's getting most of the attention

00:04:18.000 | is, of course, deep learning,

00:04:20.000 | but other kinds of machine-learning technologies

00:04:22.000 | which essentially represent the maturation

00:04:24.000 | of a decades-long effort

00:04:26.000 | to solve the problem of pattern recognition.

00:04:28.000 | That means taking data and finding patterns in the data

00:04:31.000 | that tell you something you care about,

00:04:33.000 | like how to label a class

00:04:35.000 | or how to predict some other signal, okay?

00:04:37.000 | And pattern recognition is great.

00:04:39.000 | It's an important part of intelligence,

00:04:41.000 | and it's reasonable to say that deep learning as a technology

00:04:45.000 | has really made great strides on pattern recognition

00:04:48.000 | and maybe even, you know, has come in close

00:04:50.000 | to solving the problems of pattern recognition.

00:04:53.000 | But intelligence is about many other things.

00:04:55.000 | Intelligence is about a lot more.

00:04:57.000 | In particular, it's about modeling the world.

00:05:00.000 | And think about all the activities that a human does

00:05:03.000 | to model the world that go beyond just, say,

00:05:05.000 | recognizing patterns in data,

00:05:07.000 | but actually trying to explain and understand

00:05:09.000 | what we see, for instance, okay?

00:05:11.000 | Or to be able to imagine things that we've never seen,

00:05:14.000 | that never seen, maybe even very different

00:05:16.000 | from anything we've ever seen, but might want to see,

00:05:19.000 | and then to set those as goals,

00:05:21.000 | to make plans and solve problems

00:05:23.000 | needed to make those things real.

00:05:25.000 | Or thinking about learning, again, you know,

00:05:27.000 | some kinds of learning can be thought of as pattern recognition

00:05:30.000 | if you're learning sufficient statistics

00:05:32.000 | or weights in a neural net that are used for those purposes.

00:05:35.000 | But many activities of learning are about building out new models,

00:05:38.000 | right, either refining, reusing, improving old models,

00:05:41.000 | or actually building fundamentally new models

00:05:43.000 | as you experience more of the world.

00:05:45.000 | And then think about sharing our models,

00:05:47.000 | communicating our models to others,

00:05:49.000 | modeling their models, learning from them.

00:05:51.000 | All these activities of modeling,

00:05:53.000 | these are at the heart of human intelligence,

00:05:55.000 | and it requires a much broader set of tools.

00:05:58.000 | So I want to talk about the ways we're studying

00:06:00.000 | these activities of modeling the world,

00:06:02.000 | and something in a pretty non-technical way

00:06:04.000 | about what are the kind of tools

00:06:06.000 | that we need to capture these abilities.

00:06:08.000 | Now, I think it's-- I want to be very honest up front

00:06:11.000 | and to say this is just the beginning of a story, right?

00:06:14.000 | When you look at deep learning successes,

00:06:16.000 | that itself is a story that goes back decades.

00:06:18.000 | I'll say a little bit about that history in a minute.

00:06:20.000 | But where we are now is just looking forward to a future

00:06:23.000 | when we might be able to capture these abilities,

00:06:25.000 | you know, at a really mature engineering scale.

00:06:27.000 | And I would say we are far from being able to capture

00:06:30.000 | all the ways in which humans richly, flexibly, quickly

00:06:33.000 | build models of the world at the kind of scale

00:06:35.000 | that, say, Silicon Valley wants,

00:06:37.000 | either big tech companies like Google or Microsoft

00:06:39.000 | or IBM or Facebook or small startups, right?

00:06:42.000 | We can get there.

00:06:44.000 | And I think what I want to talk to you about here

00:06:47.000 | is one route for trying to get there,

00:06:49.000 | and this is the route that CBMM stands for,

00:06:51.000 | the idea that by reverse engineering

00:06:53.000 | how intelligence works in the human mind and brain,

00:06:55.000 | that will give us a route

00:06:57.000 | to engineering these abilities in machines.

00:06:59.000 | When we say reverse engineering, we're talking about science,

00:07:02.000 | but doing science like engineers.

00:07:04.000 | That's our fundamental principle,

00:07:06.000 | that if we approach cognitive science and neuroscience

00:07:08.000 | like an engineer, where, say, the output of our science

00:07:10.000 | isn't just a description of the brain or the mind in words,

00:07:13.000 | but in the same terms that an engineer would use

00:07:15.000 | to build an intelligent system,

00:07:17.000 | then that will be both the basis for a much more rigorous

00:07:19.000 | and deeply insightful science,

00:07:21.000 | but also direct translation of those insights

00:07:23.000 | into engineering applications.

00:07:25.000 | Now, I said before I talk a little about history,

00:07:28.000 | what I mean by that is this.

00:07:30.000 | Again, if part of what brought you here is deep learning,

00:07:33.000 | and I know even if you've never heard of deep learning before,

00:07:35.000 | which I'm sure is unlikely,

00:07:37.000 | you saw a good spectrum of that

00:07:39.000 | in the overview session last night.

00:07:42.000 | It's really interesting and important

00:07:45.000 | to look back on the history of where did techniques

00:07:47.000 | for deep learning come from, or reinforcement learning.

00:07:50.000 | Those are the two tools in the current machine learning arsenal

00:07:53.000 | that are getting the most attention,

00:07:55.000 | things like back propagation or end-to-end stochastic gradient descent

00:07:58.000 | or temporal difference learning or Q-learning.

00:08:00.000 | Here's a few papers from the literature.

00:08:02.000 | Maybe some of you have read these original papers.

00:08:04.000 | Here's the original paper by Rumelhart, Hinton, and colleagues

00:08:07.000 | in which they introduced the back propagation algorithm

00:08:09.000 | for training multilayer perceptrons,

00:08:11.000 | multilayer neural networks.

00:08:13.000 | Here's the original perceptron paper by Rosenblatt,

00:08:15.000 | which introduced the one-layer version of that architecture

00:08:17.000 | and the basic perceptron learning algorithm.

00:08:20.000 | Here's the first paper on the temporal difference learning method

00:08:24.000 | for reinforcement learning from Sutton and Bartow.

00:08:27.000 | Here's the original Boltzmann machine paper,

00:08:29.000 | also by Hinton and colleagues,

00:08:31.000 | which, again, those of you who don't know that architecture,

00:08:34.000 | think of a kind of probabilistic, undirected, multilayer perceptron.

00:08:39.000 | Or, for example, before there were LSTMs,

00:08:42.000 | if you know about current recurrent neural network architecture,

00:08:45.000 | earlier, much simpler versions of the same idea

00:08:47.000 | were proposed by Jeff Ellman and his simple recurrent networks.

00:08:50.000 | The reason I want to put up the original papers here

00:08:52.000 | is for you to look at both when they were published

00:08:55.000 | and where they were published.

00:08:57.000 | So if you look at the dates, you'll see papers going back

00:08:59.000 | to the '80s, but even the '60s, or even the 1950s.

00:09:03.000 | And look at where they were published.

00:09:05.000 | Most of them were published in psychology journals.

00:09:08.000 | So the journal Psychological Review, if you don't know it,

00:09:10.000 | is like the leading journal of theoretical psychology

00:09:12.000 | and mathematical psychology.

00:09:14.000 | Or Cognitive Science, the journal of the Cognitive Science Society.

00:09:17.000 | Or the backprop paper was published in Nature,

00:09:20.000 | which is a general interest science journal,

00:09:22.000 | but by people who are mostly affiliated with

00:09:24.000 | the Institute for Cognitive Science in San Diego.

00:09:26.000 | So what you see here is already a long history

00:09:29.000 | of scientists thinking like engineers.

00:09:31.000 | These are people who are in psychology or cognitive science departments

00:09:34.000 | and publishing in those places, but by formalizing

00:09:37.000 | even very basic insights about how humans might learn,

00:09:40.000 | or how brains might learn, in the right kind of math,

00:09:44.000 | that led to, of course, progress on the science side,

00:09:47.000 | but it led to all the engineering that we see now.

00:09:49.000 | It wasn't sufficient, right?

00:09:51.000 | We needed, of course, lots of innovations and advances

00:09:54.000 | in computing hardware and software systems, right?

00:09:57.000 | But this is where the basic math came from,

00:10:00.000 | and it came from doing science like an engineer.

00:10:03.000 | So what I want to talk about in our vision is,

00:10:05.000 | what does the future of this look like?

00:10:07.000 | If we were to look 50 years into the future,

00:10:09.000 | what would we be looking back on now, or over this time scale?

00:10:12.000 | Well, here's a long-term research roadmap that reflects

00:10:15.000 | some of my ambitions and some of our center's goals,

00:10:18.000 | and many others, too.

00:10:20.000 | We'd like to be able to address basic questions,

00:10:22.000 | questions of what it is to be and to think like a human,

00:10:25.000 | questions, for example, of consciousness,

00:10:27.000 | or meaning and language, or real learning, right?

00:10:30.000 | Questions like, even beyond the individual,

00:10:33.000 | like questions of culture, or creativity.

00:10:36.000 | Those are our big ideas up there.

00:10:38.000 | And for each of these, there are basic scientific questions.

00:10:40.000 | How do we become aware of the world and ourselves in it?

00:10:43.000 | It starts with perception, but it really turns into awareness,

00:10:46.000 | awareness of yourself and of the world,

00:10:48.000 | and what we might call consciousness, right?

00:10:50.000 | How does a word start to have a meaning?

00:10:52.000 | What really is a meaning, and how does a child grasp it?

00:10:54.000 | Or how do children actually learn?

00:10:56.000 | What do babies' brains actually start with?

00:10:58.000 | Are they blank slates, or do they start with some kind of

00:11:00.000 | cognitive structure?

00:11:01.000 | And then what does real learning look like?

00:11:03.000 | These are just some of the questions that we're interested

00:11:05.000 | in working on.

00:11:06.000 | Or when we talk about culture, we mean,

00:11:08.000 | how do you learn all the things you didn't directly experience,

00:11:11.000 | right, but that somehow you got from the accumulation of

00:11:13.000 | knowledge in society over many generations?

00:11:16.000 | Or how do you ever think of new ideas,

00:11:18.000 | or answers to new questions?

00:11:19.000 | How do you think of the new questions themselves?

00:11:21.000 | How do you decide what to think about?

00:11:23.000 | These are all key activities of human intelligence.

00:11:25.000 | When we talk about how we model the world,

00:11:27.000 | where our models come from, what we do with our models,

00:11:29.000 | this is what we're talking about.

00:11:31.000 | And if we could get machines that could do these things,

00:11:33.000 | well, again, on the bottom row, think of all the actual

00:11:35.000 | real engineering payoffs.

00:11:36.000 | Now, in our center, in both my own activities and a lot of what

00:11:40.000 | my group does these days, and what a number of other

00:11:42.000 | colleagues in the Center for Brains, Minds, and Machines do,

00:11:45.000 | as well as, you know, very broadly people in BCS and CSAIL,

00:11:48.000 | one place where we work on the beginnings of these problems,

00:11:50.000 | in the near term, this is the long term,

00:11:52.000 | like think 50 years, okay, maybe shorter, maybe longer,

00:11:55.000 | I don't know, but think well beyond 10 years.

00:11:59.000 | But in the short term, 5 to 10 years,

00:12:01.000 | a lot of our focus is around visual intelligence,

00:12:03.000 | and there's many reasons for that.

00:12:05.000 | Again, we can build on the successes of deep networks,

00:12:07.000 | and a lot of pattern recognition and machine vision.

00:12:10.000 | It's a good way to put these ideas into practice.

00:12:12.000 | When we look at the actual brain,

00:12:14.000 | the visual system in the brain, in the human and other

00:12:17.000 | mammalian brains, for example, is really,

00:12:19.000 | very clearly the best understood part of the brain,

00:12:21.000 | and at a circuit level, it's the part of the brain

00:12:23.000 | that's most inspired current deep learning

00:12:26.000 | and neural network systems.

00:12:28.000 | But even there, there's things which we still don't

00:12:30.000 | really understand like engineers.

00:12:32.000 | So here's an example of a basic problem in visual intelligence

00:12:35.000 | that we and others in the Center are trying to solve.

00:12:39.000 | Look around you, and you feel like there's a whole world

00:12:43.000 | around you, and there is a whole world around you,

00:12:45.000 | and you feel like your brain captures it.

00:12:47.000 | But what the actual sense data that's coming in through

00:12:50.000 | your eyes looks more like this photograph here,

00:12:52.000 | where you can see there's a crowd scene,

00:12:54.000 | but it's mostly blurry except for a small region

00:12:56.000 | of high resolution in the center.

00:12:58.000 | So that corresponds biologically to what part of the image

00:13:00.000 | is in your fovea.

00:13:02.000 | That's the central region of cells in the retina,

00:13:04.000 | where you have really high resolution visual data.

00:13:07.000 | The size of your fovea is roughly like if you hold out

00:13:09.000 | your thumb at arm's length, it's a little bit bigger

00:13:11.000 | than that, but not much bigger.

00:13:13.000 | Most of the image, in terms of the actual information

00:13:16.000 | coming in in a bottom-up sense to your brain,

00:13:18.000 | is really quite blurry.

00:13:20.000 | But somehow by looking at just one part,

00:13:22.000 | and then by saccading around or making a few eye movements,

00:13:25.000 | you get a few glimpses, each not much bigger than the size

00:13:27.000 | of your thumb at arm's length.

00:13:29.000 | Somehow you stitch that information together

00:13:31.000 | into what feels like and really is a rich representation

00:13:34.000 | of the whole world around you.

00:13:36.000 | And when I say around you, I mean literally around you.

00:13:38.000 | So here's another kind of demonstration.

00:13:41.000 | Without turning around, nobody's allowed to turn around,

00:13:44.000 | ask yourself, what's behind you?

00:13:46.000 | Now the answer's going to be different for different people,

00:13:48.000 | depending on where you're sitting.

00:13:50.000 | For most of you, you might think, well,

00:13:52.000 | I think there's a person pretty close behind me.

00:13:55.000 | You know you're in a crowded auditorium,

00:13:56.000 | although you haven't seen that person.

00:13:58.000 | You know that they're there.

00:14:00.000 | For people in the very back row, you know there isn't

00:14:02.000 | a person behind you, and you're conscious of being

00:14:04.000 | in the back row.

00:14:05.000 | You might be conscious that there's a wall right behind you.

00:14:07.000 | But now for the people who are in the room,

00:14:10.000 | not in the very back, think about how far behind you

00:14:13.000 | is the back, like where's the nearest wall behind you?

00:14:15.000 | So we can, maybe we can call out,

00:14:17.000 | try a little demonstration.

00:14:18.000 | So I don't know, I'm pointing to someone there.

00:14:20.000 | Can you see, say something if you think I'm pointing at you.

00:14:24.000 | Well, I could have been pointing at you,

00:14:25.000 | but I'm pointing to someone behind you, okay.

00:14:27.000 | I'll point to you, yeah, I'm pointing to you.

00:14:29.000 | All right, so how far is the nearest wall?

00:14:31.000 | No, you can't turn around, you've blown your chance.

00:14:33.000 | (audience laughing)

00:14:34.000 | Without turning around, okay, so you were,

00:14:36.000 | okay, do you see I'm pointing to you there with the tie?

00:14:38.000 | Okay, so without turning around,

00:14:40.000 | how far is the nearest wall behind you?

00:14:42.000 | That's, sorry, how far?

00:14:48.000 | Five meters, okay, well, I mean,

00:14:50.000 | that might be about right.

00:14:51.000 | No, no, other people can turn around.

00:14:54.000 | Now, how about you, how far is the nearest wall behind you?

00:14:57.000 | 10 meters, okay, that might be right, yeah.

00:15:03.000 | How about here, what do you think?

00:15:06.000 | 20, okay, so yeah, since I didn't grow up

00:15:09.000 | in the metric system, I barely know, but yeah, I mean,

00:15:11.000 | I mean, the point is that like,

00:15:13.000 | you're, each of you is surely not exactly right,

00:15:17.000 | but you're certainly within an order of magnitude,

00:15:19.000 | and I guess if we actually tried to measure,

00:15:21.000 | you know, you're probably, my guess is you're probably right

00:15:23.000 | within, you know, 50% or less,

00:15:25.000 | often, you know, maybe just 20% error.

00:15:28.000 | Okay, so how do you know this?

00:15:29.000 | I mean, even if it's not, what did you say, 20 meters?

00:15:31.000 | Even if it's not 20 meters, it's probably closer to 20 meters

00:15:34.000 | than it is to five or 10 meters,

00:15:36.000 | and then it is to 50 meters, so how do you know this?

00:15:38.000 | You haven't turned around in a while, right?

00:15:40.000 | But some part of your brain is tracking

00:15:42.000 | the whole world around you, right?

00:15:44.000 | And how many people are behind you?

00:15:46.000 | Yeah, like a few hundred, right?

00:15:49.000 | I mean, I don't know if it's 200 or 300,

00:15:51.000 | but it's not 1,000, I don't think so,

00:15:54.000 | and it's certainly not 10 or 20 or 50, right?

00:15:57.000 | So you track these things,

00:15:59.000 | and you use them to plan your actions.

00:16:01.000 | Okay, so again, think about how instantly, effortlessly,

00:16:04.000 | and very reliably, okay,

00:16:06.000 | your brain computes all these things,

00:16:08.000 | so the people and objects around you,

00:16:10.000 | and it's not just, you know, approximations.

00:16:12.000 | Certainly, when we're talking about what's behind you in space,

00:16:15.000 | there's a lot of imprecision,

00:16:17.000 | but when it comes to reaching for things right in front of you,

00:16:20.000 | very precise shape and physical property estimates

00:16:22.000 | needed to pick up and manipulate objects,

00:16:24.000 | and then when it comes to people,

00:16:26.000 | it's not just the existence of the people,

00:16:28.000 | but something about what's in their head, right?

00:16:30.000 | You track whether someone's paying attention to you

00:16:32.000 | when you're talking to them, what they might want from you,

00:16:34.000 | what they might be thinking about you,

00:16:36.000 | what they might be thinking about other people, okay?

00:16:38.000 | So when we talk about visual intelligence,

00:16:40.000 | this is the whole stuff we're talking about,

00:16:42.000 | and you can start to see how it turns into basic questions,

00:16:45.000 | I think, of what we might call the beginnings of consciousness,

00:16:49.000 | or at least our awareness of ourself in the world,

00:16:52.000 | and of ourselves as a self in the world,

00:16:55.000 | but also other aspects of higher-level intelligence

00:16:57.000 | and cognition that are not just about perception,

00:16:59.000 | like symbols, right, to describe, even to ourselves,

00:17:02.000 | what's around us and where we are and what we can do with it.

00:17:05.000 | You have to go beyond just what we would normally call

00:17:08.000 | the stuff of perception to, say, the thoughts in somebody's head

00:17:11.000 | and your own thoughts about that, okay?

00:17:13.000 | So what we've been doing in CBMM

00:17:15.000 | is trying to develop an architecture

00:17:17.000 | for visual intelligence,

00:17:19.000 | and I'm not going to go into any of the details of how this works,

00:17:22.000 | and this is just notional, this is just a picture,

00:17:24.000 | it's like just a sketch from a grant proposal

00:17:26.000 | of what we say we want to do,

00:17:28.000 | but it's based on a lot of scientific understanding

00:17:30.000 | of how the brain works.

00:17:32.000 | There are different parts of the brain that correspond

00:17:34.000 | to these different modules in our architecture,

00:17:36.000 | as well as some kind of emerging engineering way

00:17:38.000 | to try to capture at the software

00:17:40.000 | and maybe even hardware levels how these modules might work.

00:17:43.000 | So we talk about sort of an early module

00:17:45.000 | of a visual or perceptual stream,

00:17:47.000 | which is like bottom-up visual or other perceptual input.

00:17:50.000 | That's the kind of thing that is pretty close

00:17:52.000 | to what we currently have in, say,

00:17:54.000 | deep convolutional neural networks.

00:17:56.000 | But then we talk about some kind of--

00:17:58.000 | the output of that isn't just pattern class labels,

00:18:00.000 | but what we call the cognitive core,

00:18:02.000 | core cognition.

00:18:04.000 | So again, an understanding of space and objects,

00:18:06.000 | their physics, other people, their minds,

00:18:08.000 | that's the real stuff of cognition

00:18:10.000 | that has to be the output of perception.

00:18:12.000 | But somehow we have to have--

00:18:14.000 | this is what we call the brain OS in this picture.

00:18:17.000 | We have to get there by stitching together

00:18:19.000 | the bottom-up inputs from a glimpse here,

00:18:21.000 | a glimpse here, a little bit here and there,

00:18:23.000 | and accessing prior knowledge

00:18:25.000 | from our memory systems to tell us

00:18:27.000 | how to stitch these things together

00:18:29.000 | into the really core cognitive representations

00:18:31.000 | of what's out there in the world.

00:18:33.000 | And then if we're going to start to talk about it in language

00:18:36.000 | or to build plans on top of what we have seen and understood,

00:18:40.000 | that's where we talk about symbols coming into the picture,

00:18:44.000 | the building blocks of language and plans and so on.

00:18:48.000 | So now we might say, well, okay, this is an architecture

00:18:52.000 | that is brain-inspired and cognitively inspired,

00:18:55.000 | and we're planning to turn into real engineering.

00:18:57.000 | And you can say, well, do we need that?

00:18:59.000 | Maybe--again, I know this is a question

00:19:01.000 | you considered in the first lecture.

00:19:03.000 | Maybe the engineering toolkit that's currently

00:19:05.000 | been making a lot of progress in, let's say, industry,

00:19:07.000 | maybe that's good enough.

00:19:09.000 | Maybe let's take deep learning, but to stand for a broader set

00:19:12.000 | of modern pattern-recognition-based

00:19:14.000 | and reinforcement-learning-based tools

00:19:16.000 | and say, okay, well, maybe that can scale up to this.

00:19:20.000 | And you might--maybe that's possible.

00:19:22.000 | I'm happy in the question period if people want to debate this.

00:19:25.000 | My sense is no.

00:19:27.000 | I think that it's not--when I say no,

00:19:29.000 | I don't mean, like, it can't happen or it won't happen.

00:19:32.000 | What I mean is the highest value,

00:19:34.000 | the highest expected route right now

00:19:36.000 | is to take this more science-based

00:19:38.000 | reverse engineering approach, and that at least

00:19:40.000 | if you follow the current trajectory

00:19:42.000 | that industry incentives especially optimize for,

00:19:45.000 | it's not even really trying to take us to these things.

00:19:48.000 | So think about, for example, a case study

00:19:50.000 | of visual intelligence that is in some ways

00:19:52.000 | as pattern recognition very much of a success.

00:19:54.000 | It's, again, been mostly driven by industry.

00:19:56.000 | It's something that if you read in the news

00:19:59.000 | or even play around with in certain publicly available

00:20:02.000 | data sets, feels like we've made great progress.

00:20:04.000 | And this is an aspect of visual intelligence,

00:20:07.000 | which is sometimes called image captioning

00:20:09.000 | or mapping images to text.

00:20:12.000 | You know, basically there's been a bunch of systems.

00:20:15.000 | Here's a couple of press releases.

00:20:17.000 | This one's about Google.

00:20:19.000 | Google's AI can now capture images

00:20:21.000 | almost as well as humans.

00:20:23.000 | Here's one about Microsoft.

00:20:25.000 | A couple of years ago, I think, there were something

00:20:27.000 | like eight papers all released onto archive

00:20:29.000 | around the same time from basically all the major

00:20:31.000 | industry computer vision groups as well as a couple

00:20:33.000 | of academic partners, which all driven

00:20:36.000 | by basically the same data set produced

00:20:38.000 | by some Microsoft researchers and other collaborators,

00:20:41.000 | trained a combination of deep convolutional neural networks,

00:20:44.000 | state-of-the-art visual pattern recognition,

00:20:47.000 | with recurrent neural networks, which had recently

00:20:49.000 | been developed for basically kinds of neural

00:20:51.000 | statistical language modeling, glued them together

00:20:54.000 | and produced a system which made very impressive results

00:20:57.000 | in a big training set and a held-out test set

00:21:00.000 | where the goal was to take an image and write

00:21:02.000 | a sentence-like, a short sentence caption

00:21:05.000 | that would seem like the kind of way

00:21:07.000 | a human would describe that image.

00:21:09.000 | And these systems surpassed human-level accuracy

00:21:12.000 | on the held-out test set from a big training set.

00:21:15.000 | But what you can see when you really dig into these things

00:21:17.000 | is there's often a lot of what I would call

00:21:19.000 | data set overfitting.

00:21:21.000 | It's not overfitting to the training set,

00:21:23.000 | but it's overfitting to whatever are the particular

00:21:25.000 | characteristics of this data set,

00:21:27.000 | wherever it came from, certain set of photographs

00:21:29.000 | and certain ways of captioning them,

00:21:32.000 | which even a big data set, it's not about quantity.

00:21:35.000 | It's more about the quality, the nature

00:21:37.000 | of what people are doing.

00:21:39.000 | So one way to test this system is to apply it

00:21:42.000 | to what seems like basically the same problem,

00:21:45.000 | but not within a certain curated or built data set.

00:21:49.000 | And there's a convenient Twitter bot that lets you do this.

00:21:52.000 | So there's something called the PickDeskBot,

00:21:54.000 | which takes one of the state-of-the-art

00:21:56.000 | industry AI captioning systems, a very good one.

00:21:58.000 | Again, this is not meant to--I'm not trying

00:22:00.000 | to critique these systems for what they're trying to do.

00:22:02.000 | I'm just trying to point out what they don't really

00:22:04.000 | even try to do.

00:22:06.000 | So this takes the Microsoft CaptionBot,

00:22:08.000 | and just every couple of hours takes a random image

00:22:10.000 | from the web, captions it, and uploads the results to Twitter.

00:22:13.000 | And a couple of months ago, when I prepared

00:22:15.000 | a first version of this talk, I just took a few days

00:22:18.000 | in the life of this Twitter bot.

00:22:20.000 | I didn't take every single image, but I took, you know,

00:22:22.000 | most of the images in a way that was meant

00:22:24.000 | to be representative of the successes

00:22:26.000 | and the kinds of failures that such a system will make.

00:22:28.000 | So we can go through this, and it's a little bit

00:22:30.000 | entertaining and I think quite informative.

00:22:32.000 | So here's just a somewhat random sample

00:22:35.000 | of a few days in the life of one of these CaptionBots.

00:22:39.000 | So here we have a picture of a person holding--

00:22:42.000 | fortunately, my screen is very small here,

00:22:44.000 | and I can't read up there, so maybe you'll have to tell me

00:22:46.000 | what it says--but a person holding a cell phone.

00:22:48.000 | I guess I'll just read along with you.

00:22:50.000 | So we have a person holding a cell phone.

00:22:52.000 | Well, it's not a person holding a cell phone,

00:22:54.000 | but it's kind of close.

00:22:56.000 | It's a person holding some kind of machine.

00:22:58.000 | I don't even know what that is,

00:23:00.000 | but it's some kind of musical instrument, right?

00:23:02.000 | So that's a mixed success or failure.

00:23:04.000 | Here's a person on a field playing football.

00:23:06.000 | I would call that an A result, maybe even A+.

00:23:10.000 | Here's a group of people standing on top of a mountain.

00:23:13.000 | So less good. There's a mountain,

00:23:15.000 | but as far as I can tell, there's no people.

00:23:17.000 | But these systems like to see people

00:23:19.000 | because in the data set they were trained on,

00:23:21.000 | there's a lot of people, and people often talk about people.

00:23:24.000 | And the fact that you can appreciate

00:23:26.000 | both what I said and why it's funny--

00:23:28.000 | there you did some of my cognitive activities

00:23:30.000 | that this system is not even trying to do.

00:23:32.000 | Okay, here we've got a building with a cake.

00:23:34.000 | I'll go through these fast.

00:23:36.000 | A building with a cake, a large stone building with a clock tower.

00:23:39.000 | I think that's pretty good. I'd give that like a B+.

00:23:41.000 | There's no clock, but it's plausibly right.

00:23:43.000 | There might be a clock in there.

00:23:45.000 | There's definitely something like that.

00:23:47.000 | Here's a truck parked on the side of a building.

00:23:49.000 | I don't know, maybe a B-.

00:23:51.000 | There is a car on the side of a building,

00:23:53.000 | but it's not a truck, and it doesn't seem like the main thing in the image.

00:23:56.000 | Here's a necklace made of bananas.

00:23:58.000 | (laughter)

00:24:00.000 | A large ship in the water.

00:24:02.000 | This is pretty good. I'd give this like an A- or B+,

00:24:04.000 | because there is a ship in the water, but it's not very large.

00:24:07.000 | It's really more of like a tugboat or something.

00:24:09.000 | Here's a sign sitting on the grass.

00:24:11.000 | You know, in some sense, that's great.

00:24:13.000 | No, but in another sense, it's really missing

00:24:15.000 | what's actually interesting and important and meaningful to humans.

00:24:18.000 | Here's a garden that's in the dirt.

00:24:22.000 | (laughter)

00:24:24.000 | A pizza sitting on top of a building.

00:24:26.000 | A small house with a red brick building.

00:24:28.000 | I don't know what kind of weird way of saying it.

00:24:30.000 | A vintage photo of a pond. That's good.

00:24:32.000 | They like vintage photos.

00:24:34.000 | A group of people that are standing in the grass near a bridge.

00:24:36.000 | Again, there's two people, and there's some grass, and there's a bridge,

00:24:39.000 | but it's really not what's going on.

00:24:41.000 | A person in a yard. Okay, kind of.

00:24:43.000 | A group of people standing on top of a boat.

00:24:45.000 | There's a boat, there's a group of people, they're standing,

00:24:48.000 | but again, the sentence that you see is more based on a bias

00:24:51.000 | of what people have said in the past about images that are only vaguely like this.

00:24:55.000 | A clock tower is lit up at night.

00:24:57.000 | That's actually, I think, pretty impressive.

00:24:59.000 | A large clock mounted to the side of a building.

00:25:01.000 | A little bit less so.

00:25:03.000 | A snow-covered field. Very good.

00:25:05.000 | A building with snow on the ground. A little bit less good.

00:25:07.000 | There's no snow. It's white.

00:25:09.000 | Some people who--I don't know them, but I bet that's probably right,

00:25:12.000 | because identifying faces and recognizing people who are famous

00:25:15.000 | because they won medals in the Olympics,

00:25:17.000 | probably I would trust current pattern recognition systems to get that.

00:25:21.000 | A painting of a vase in front of a mirror.

00:25:23.000 | Less good.

00:25:25.000 | I think there's a guy in there, but we didn't get him.

00:25:27.000 | A person walking in the rain.

00:25:29.000 | Again, there is sort of a person, and there's some puddles,

00:25:32.000 | but a group of stuffed animals.

00:25:35.000 | [laughter]

00:25:37.000 | A car parked in a parking lot. That's good.

00:25:39.000 | A car parked in front of a building.

00:25:41.000 | Less good.

00:25:43.000 | A plate with a fork and knife. A clear blue sky.

00:25:45.000 | Okay, so you get the idea.

00:25:47.000 | Again, if you actually go and play with this system,

00:25:50.000 | partly because I think--my friends at Microsoft told me they've improved it some.

00:25:54.000 | This is partly for entertainment values.

00:25:56.000 | I chose what also would be the funnier example.

00:25:59.000 | I want to be quite honest about this.

00:26:01.000 | I'm not trying to take away what are impressive AI technologies,

00:26:05.000 | but I think it's clear that there's a sense of understanding

00:26:08.000 | in any one of these images that it's important to see

00:26:11.000 | that even when it seems to be correct,

00:26:13.000 | if it can make the kind of errors that it makes,

00:26:15.000 | that even when it seems to be correct,

00:26:17.000 | it's probably not doing what you're doing,

00:26:19.000 | and it's probably not even trying to scale towards the dimensions

00:26:22.000 | of intelligence that we think about when we're talking about human intelligence.

00:26:26.000 | Another way to put this--I'm going to show you a really insightful blog post

00:26:29.000 | from one of your other speakers.

00:26:31.000 | In a couple of days, I'm not sure, you're going to have Andrej Karpathy,

00:26:34.000 | who's one of the leading people in deep learning.

00:26:38.000 | This is a really great blog post he wrote a couple of years ago

00:26:41.000 | when he was, I think, still at Stanford.

00:26:43.000 | He got his PhD from Stanford.

00:26:45.000 | He worked at Google a little bit on some early big neural net AI projects there.

00:26:51.000 | He was at OpenAI. He was one of the founders of OpenAI.

00:26:54.000 | Recently, he joined Tesla as their director of AI research.

00:26:58.000 | About five years ago, he was looking at the state of computer vision

00:27:02.000 | from a human intelligence point of view and lamenting how far away we were.

00:27:06.000 | This is the title of his blog post, "The State of Computer Vision and AI."

00:27:10.000 | We are really, really far away.

00:27:12.000 | He took this image, which was a famous image in its own right.

00:27:17.000 | It was a popular image of Obama back when he was president

00:27:20.000 | playing around as he liked to do when he was on tour.

00:27:22.000 | If you take a look at this, you can see you probably all can recognize

00:27:26.000 | the previous president of the United States,

00:27:28.000 | but you can also get the sense of where he is and what's going on.

00:27:31.000 | You might see people smiling, and you might get the sense

00:27:33.000 | that he's playing a joke on someone. Can you see that?

00:27:36.000 | How do you know that he's playing a joke and what that joke is?

00:27:40.000 | As Andrej goes on to talk about in his blog post,

00:27:43.000 | if you think about all the things that you have to really deploy in your mind

00:27:47.000 | to understand that, it's a huge list.

00:27:49.000 | Of course, it starts with seeing people and objects

00:27:51.000 | and maybe doing some face recognition,

00:27:53.000 | but you have to do things like, for example,

00:27:55.000 | notice his foot on the scale and understand enough about how scales work

00:27:59.000 | that when a foot presses down, it exerts force,

00:28:01.000 | that the scale is sensitive.

00:28:03.000 | It doesn't just magically measure people's weight,

00:28:05.000 | but it does that somehow through force.

00:28:07.000 | You have to see who can see that he's doing that

00:28:09.000 | and who cannot see that he's doing that,

00:28:11.000 | in particular the person on the scale,

00:28:13.000 | and why some people can see that he's doing that

00:28:15.000 | and can see that some other people can't see it,

00:28:17.000 | why that makes it funny to them.

00:28:19.000 | Someday we should have machines that can understand this,

00:28:23.000 | but hopefully you can see why the kind of architecture

00:28:28.000 | that I'm talking about would be the building blocks

00:28:31.000 | or the ingredients to be able to get them to do that.

00:28:34.000 | Again, I prepared a version of this talk a few months ago,

00:28:37.000 | and I wrote to Andrej and I said I was going to use this,

00:28:40.000 | and I was curious if he had any reflections on this

00:28:44.000 | and where he thought we were relative to five years ago,

00:28:47.000 | because certainly a lot of progress has been made.

00:28:49.000 | But he said--here's his email.

00:28:51.000 | I hope he doesn't mind me sharing it,

00:28:53.000 | but again, he's a very honest person,

00:28:55.000 | and that's one of the many reasons why he's such an important person right now in AI.

00:28:58.000 | He's both very technically strong and honest about what we can do, what we can't do,

00:29:02.000 | and as he says--what does he say?

00:29:04.000 | "It's nice to hear from you. It's fun you should bring this up.

00:29:06.000 | I was also thinking about writing a return to this."

00:29:09.000 | And in short, basically, I don't believe we've made very much progress.

00:29:12.000 | He points out that in his long list of things that you'd need to understand the image,

00:29:16.000 | we have made progress on some--the ability to, again, detect people

00:29:19.000 | and do face recognition for well-known individuals.

00:29:22.000 | But that's kind of about it.

00:29:24.000 | And he wasn't particularly optimistic that the current route that's being pursued in industry

00:29:28.000 | is anywhere close to solving or even really trying to solve these larger questions.

00:29:34.000 | If we give this image to that caption bot,

00:29:38.000 | what we see is, again, represents the same point.

00:29:41.000 | So here's the caption bot.

00:29:42.000 | It says, "I think it's a group of people standing next to a man in a suit and tie."

00:29:46.000 | So that's right as far as it goes.

00:29:49.000 | It just doesn't go far enough, and the current ideas of build a data set,

00:29:54.000 | train a deep learning algorithm on it, and then repeat

00:29:58.000 | aren't really even, I would venture, trying to get to what we're talking about.

00:30:03.000 | Or here's another--I'll just give you one other example of a couple of photographs

00:30:06.000 | from my recent vacation in a nice, warm, tropical locale,

00:30:11.000 | which I think illustrate ways in which, again, the gap where we have machines

00:30:15.000 | that can, say, beat the world's best at Go,

00:30:19.000 | but can't even beat a child at tic-tac-toe.

00:30:21.000 | Now, what do I mean by that?

00:30:23.000 | Well, of course, we don't even need reinforcement learning or deep learning

00:30:26.000 | to build a machine that can win or tie, do optimally in tic-tac-toe.

00:30:31.000 | But think about this.

00:30:32.000 | This is a real tic-tac-toe game, which I saw on the grass outside my hotel.

00:30:37.000 | What do you have to do to look at this and recognize that it's a tic-tac-toe game?

00:30:40.000 | You have to see the objects.

00:30:41.000 | You have to see what's--in some sense, there's a 3-by-3 grid,

00:30:45.000 | but it's only abstract.

00:30:46.000 | It's only delimited by these ropes or strings.

00:30:51.000 | It's not actually a grid in any simple geometric sense.

00:30:55.000 | But yet a child can look at that--and indeed, here's an actual child

00:30:57.000 | who was looking at it--and recognize, "Oh, it's a game of tic-tac-toe,"

00:31:00.000 | and even know what they need to do to win, namely put the X and complete it,

00:31:03.000 | and now they've got three in a row.

00:31:04.000 | That's literally child's play.

00:31:07.000 | You show this sort of thing, though, to one of these image-understanding

00:31:10.000 | caption bots, and I think it's a close-up of a sign.

00:31:14.000 | Again, saying that this is a close-up of a sign is not the same thing,

00:31:22.000 | I would venture, as a cognitive or computational activity

00:31:25.000 | that's going to give us what we need to, say, recognize the object,

00:31:28.000 | to recognize it as a game, to understand the goal,

00:31:30.000 | and how to plan to achieve those goals.

00:31:32.000 | Whereas this kind of architecture is designed to try to do

00:31:35.000 | all of these things, ultimately.

00:31:37.000 | I bring in these examples of games or jokes to really show where perception

00:31:43.000 | goes to cognition, all the way up to symbols.

00:31:47.000 | So to get objects and forces and mental states, that's the cognitive core,

00:31:52.000 | but to be able to get goals and plans and what do I do

00:31:56.000 | or how do I talk about it, that's symbols.

00:31:59.000 | Here's another way into this, and it's one that also motivates, I think,

00:32:02.000 | a lot of really good work on the engineering side,

00:32:04.000 | and a lot of our interest in the science side, is think about robotics

00:32:08.000 | and think about what do you have to do to --

00:32:11.000 | what does the brain have to be like to control the body?

00:32:14.000 | So again, you're going to hear from, shortly, I think maybe it's next week,

00:32:18.000 | from Mark Raybert, who's one of the founders of Boston Dynamics,

00:32:22.000 | which is one of my favorite companies anywhere.

00:32:25.000 | They're without doubt the leading maker of humanoid robots,

00:32:29.000 | legged locomoting robots in industry.

00:32:32.000 | They have all sorts of other really cool robots, robots like dogs,

00:32:36.000 | robots that have -- I think you'll even get to see a live demonstration

00:32:40.000 | of one of these robots. It's really awesome, impressive stuff.

00:32:44.000 | But what about the minds and brains of these robots?

00:32:47.000 | Well, again, if you ask Mark, ask them how much of human-like cognition

00:32:51.000 | do they have in their robots, and I think he would say very little.

00:32:54.000 | In fact, we have asked him that, and he would say very little.

00:32:57.000 | He has said very little.

00:32:59.000 | He's actually one of the advisors of our center, and I think in many ways

00:33:02.000 | we're very much on the same page. We both want to know,

00:33:05.000 | how do you build the kind of intelligence that can control these bodies

00:33:08.000 | like the way a human does?

00:33:11.000 | Here's another example of an industry robotics effort.

00:33:13.000 | This is Google's Arm Farm, where they've got lots of robot arms,

00:33:16.000 | and they're trying to train them to pick up objects using various kinds

00:33:19.000 | of deep learning and reinforcement learning techniques.

00:33:22.000 | I think it's one approach. I just think it's very, very different

00:33:25.000 | from the way humans learn to, say, control their body and manipulate objects.

00:33:29.000 | You can see that in terms of things that go back to what you were saying

00:33:32.000 | when you were introducing me. Think about how quickly we learn things.

00:33:35.000 | Here you have the Arm Farmers trying to generate, effectively,

00:33:39.000 | maybe if not infinite, but hundreds of thousands, millions of examples

00:33:43.000 | of reaches and pickups of objects, even with just a single gripper.

00:33:47.000 | Yet a child, who in some ways can't control their body

00:33:50.000 | nearly as well as robots can be controlled at the low level,

00:33:54.000 | is able to do so much more.

00:33:56.000 | I'll show you two of my favorite videos from YouTube here,

00:33:59.000 | which motivate some of the research that we're doing.

00:34:01.000 | The one on the left is a one-and-a-half-year-old,

00:34:03.000 | and the other one's a one-year-old.

00:34:05.000 | Just watch this one-and-a-half-year-old here doing a popular activity

00:34:08.000 | for many kids. Is it playing?

00:34:11.000 | Hmm.

00:34:13.000 | Do you see a video up there?

00:34:18.000 | Hmm. Okay, there we go.

00:34:20.000 | Okay, so he's doing this stacking cup activity.

00:34:24.000 | He's stacking up cups to make a tall tower.

00:34:27.000 | He's got a stack of three, and what you can see

00:34:29.000 | from the first part of this video is it looks like he's trying

00:34:32.000 | to make a second stack that he's trying to pick up at once.

00:34:35.000 | Basically, he's trying to make a stack of two

00:34:38.000 | that'll go on the stack of three.

00:34:40.000 | And he's trying to debug his plan, because it got a little bit stuck here.

00:34:44.000 | And think about it. I mean, again, if you know anything

00:34:48.000 | about robots manipulating objects, even just what he just did,

00:34:51.000 | no robot can decide to do that and actually do it.

00:34:54.000 | At some point, he's almost got it. It's a little bit tricky,

00:34:57.000 | but at some point he's going to get that stack of two.

00:35:00.000 | He realizes he has to move that object out of the way.

00:35:02.000 | Look at what he just did. Move it out of the way,

00:35:04.000 | use two hands to pick it up.

00:35:06.000 | And now he's got a stack of two on a stack of three,

00:35:08.000 | and suddenly, you know, sub-goal completed.

00:35:10.000 | He's now got a stack of five.

00:35:12.000 | He's got a stack of 10, because he knows he accomplished

00:35:14.000 | a key waypoint along the way to his final goal.

00:35:17.000 | That's a kind of early symbolic cognition, right?

00:35:19.000 | To understand that I'm trying to build a tall tower,

00:35:22.000 | but a tower is made up of little towers.

00:35:24.000 | And you can take a tower and put it on top of another tower,

00:35:27.000 | or stack a stack on a stack, and you have a bigger stack.

00:35:30.000 | So think about how he goes from bottom-up perception

00:35:33.000 | to the objects, to the physics needed to manipulate the objects,

00:35:36.000 | to the ability to make even those early kinds of symbolic plans.

00:35:39.000 | At some point, he keeps doing this.

00:35:41.000 | He puts another stack on there.

00:35:43.000 | I'll just jump to the end.

00:35:45.000 | Oops, sorry, you missed--sorry.

00:35:47.000 | He gets really excited,

00:35:49.000 | and he gives himself another big hand, but falls over.

00:35:52.000 | OK, again, Boston Dynamics now has robots

00:35:55.000 | that could pick themselves up after that.

00:35:57.000 | That's really impressive, again.

00:35:59.000 | But all the other stuff to get to that point,

00:36:01.000 | we don't really know how to do in a robotic setting.

00:36:03.000 | Or think about this baby here. This is a younger baby.

00:36:06.000 | This is one of the Internet's very most popular videos

00:36:09.000 | because it features a baby and a cat.

00:36:11.000 | (laughter)

00:36:13.000 | But the baby's doing something interesting.

00:36:15.000 | He's got the same cups, but he's decided--

00:36:17.000 | he's, again, decided to try a new thing.

00:36:19.000 | So think about creativity.

00:36:21.000 | He's decided that his goal

00:36:23.000 | is to stack up cups on the back of a cat, I guess.

00:36:25.000 | He's asking, "How many cups can I fit on the back of a cat?"

00:36:28.000 | Well, three. Let's see, can I fit more?

00:36:31.000 | Let's try another one.

00:36:33.000 | OK, well, he can't fit more than three, it turns out.

00:36:35.000 | And then he-- then, ugh, it's not working.

00:36:37.000 | So he changes his goal.

00:36:39.000 | Now his goal appears to be to get the cups

00:36:41.000 | on the other side of the cat.

00:36:43.000 | Now watch that part when he reaches back behind him there.

00:36:45.000 | That's--I'll just pause it there for a moment.

00:36:47.000 | So when he just reached back there,

00:36:49.000 | that's a particularly striking moment in the video.

00:36:51.000 | It shows a very strong form

00:36:53.000 | of what we call in cognitive science object permanence.

00:36:56.000 | That's the idea that you represent objects

00:36:59.000 | as these permanent, enduring entities in the world,

00:37:01.000 | even when you can't see them.

00:37:03.000 | In this case, he hadn't seen or touched that object behind him

00:37:06.000 | for, like, at least a minute, right?

00:37:08.000 | Maybe much longer, I don't know.

00:37:10.000 | And yet he still knew it was there,

00:37:12.000 | and he was able to incorporate it in his plan.

00:37:14.000 | There's a moment before that when he's about to reach for it,

00:37:16.000 | but then he sees this other one, right?

00:37:18.000 | And it's only when he's now exhausted all the other objects here

00:37:20.000 | that he can see, he's like, "OK, now time to get this object

00:37:22.000 | and bring it into play," right?

00:37:24.000 | So think about what has to be going on in his brain

00:37:26.000 | for him to be able to do that, right?

00:37:28.000 | That's like the analog of you understanding what's behind you.

00:37:31.000 | It's not that these things are impossible to capture machines.

00:37:33.000 | Far from it.

00:37:35.000 | It's just that training a deep neural network

00:37:37.000 | or any kind of pattern recognition system,

00:37:39.000 | we don't think is going to do it.

00:37:41.000 | But we think by reverse engineering how it works in the brain,

00:37:43.000 | we might be able to do it.

00:37:45.000 | I believe we can do it.

00:37:47.000 | It's not just humans that do this kind of activity.

00:37:49.000 | Here's a couple of, again, rather famous videos.

00:37:51.000 | You can watch all of these on YouTube.

00:37:53.000 | Crows are famous object manipulators and tool users,

00:37:55.000 | but also orangutans, other primates, rodents.

00:37:59.000 | We can watch--here, let me pause this one for a second.

00:38:02.000 | If we watch this orangutan here, he's got a bunch of big Legos,

00:38:05.000 | and over the course of this video,

00:38:07.000 | he's building up a stack of Legos.

00:38:11.000 | It's really quite impressive.

00:38:13.000 | I'm just jumping to the end.

00:38:17.000 | There's actually some controversy out there

00:38:19.000 | of whether this video is a fake.

00:38:21.000 | But the controversy isn't about--

00:38:23.000 | it's not like whether it was, I don't know,

00:38:25.000 | done with computer animation.

00:38:27.000 | Some people think the video was actually filmed backwards,

00:38:29.000 | that a human built up the stack,

00:38:31.000 | and the orangutan just slowly disassembled it piece by piece.

00:38:33.000 | And it turns out it's remarkably hard to tell

00:38:35.000 | whether it's played forward or backwards in time,

00:38:37.000 | and people have argued over little details.

00:38:39.000 | Because it would be quite impressive

00:38:41.000 | if an orangutan actually was able to build up

00:38:43.000 | this really impressive stack of Legos.

00:38:45.000 | But I would submit that it would be almost as impressive

00:38:47.000 | if he disassembled it.

00:38:49.000 | Think about the activity.

00:38:51.000 | I mean, if I wanted to disassemble that,

00:38:53.000 | the easiest thing to do would just be to knock it over.

00:38:55.000 | But to piece by piece disassemble it,

00:38:57.000 | even if it's played backwards like this,

00:38:59.000 | that's still a really impressive act

00:39:01.000 | of symbolic planning on physical objects.

00:39:03.000 | Or here you've got this famous mouse.

00:39:05.000 | This you can find on the internet

00:39:07.000 | under the "Mouse vs. Cracker" video.

00:39:09.000 | And what you'll see here over the course of this video

00:39:11.000 | is a mouse valiantly and mostly hopelessly

00:39:13.000 | struggling with a cracker

00:39:15.000 | that they're hoping to bring back to their nest.

00:39:17.000 | I guess it's a very appealing big meal.

00:39:19.000 | And at some point after just trying to get it over

00:39:21.000 | with, it's actually able to do it.

00:39:23.000 | And at some point after just trying to get it

00:39:25.000 | over the wall,

00:39:27.000 | at some point the mouse just gives up

00:39:29.000 | because it's just never going to happen.

00:39:31.000 | And he just goes away.

00:39:33.000 | Except that because even mouses can dream,

00:39:35.000 | or mice can dream,

00:39:37.000 | at some point he decides, "Okay, I'm just going to come back

00:39:39.000 | for one more try."

00:39:41.000 | And he tries one more time, and this time valiantly gets it over.

00:39:43.000 | Isn't that very impressive?

00:39:45.000 | Congratulations, mouse.

00:39:47.000 | You can clap for me at the end, or clap for whoever later.

00:39:49.000 | But I want to applaud the mouse there

00:39:51.000 | every time I see that.

00:39:53.000 | But again, think what had to be going on in his brain

00:39:55.000 | to be able to do that.

00:39:57.000 | It's a crazy thing, and yet he formulated

00:39:59.000 | the goal and was able to achieve it.

00:40:01.000 | I'll just show one more video

00:40:03.000 | that is really more about science.

00:40:05.000 | These other ones are, some of them actually were

00:40:07.000 | from scientific experiments.

00:40:09.000 | But this is one that motivates a lot of the science

00:40:11.000 | that I do, and to me it sets up

00:40:13.000 | a grand cognitive science challenge

00:40:15.000 | for AI and robotics.

00:40:17.000 | It's from an experiment with humans, again,

00:40:19.000 | 18-month-olds or 1-1/2-year-olds.

00:40:21.000 | The kids in this experiment were the same age as

00:40:23.000 | the first baby I showed you, the one who did the stacking.

00:40:25.000 | And 18 months is really a very

00:40:27.000 | good age to study if you're

00:40:29.000 | interested in intelligence, for reasons we can talk

00:40:31.000 | about later if you're interested.

00:40:33.000 | This is from a very famous experiment done

00:40:35.000 | by two psychologists, Felix Wernicken

00:40:37.000 | and Michael Tomasello.

00:40:39.000 | It was studying the spontaneous helping

00:40:41.000 | behavior of young children.

00:40:43.000 | It also contrasted humans and chimps.

00:40:45.000 | The punchline is that chimps sometimes

00:40:47.000 | do things that are kind of like what this human did,

00:40:49.000 | but not nearly as reliably or

00:40:51.000 | as flexibly.

00:40:53.000 | I'll show you a particular

00:40:55.000 | unusual situation

00:40:57.000 | where human kids had relatively

00:40:59.000 | little trouble figuring out what to do

00:41:01.000 | or even whether they should do it, whereas

00:41:03.000 | basically no chimp did what you're going to see

00:41:05.000 | humans sometimes doing here.

00:41:07.000 | The experimenter in this movie,

00:41:09.000 | and I'll turn on the sound here if you can hear it,

00:41:11.000 | the experimenter is the tall guy,

00:41:13.000 | and the participant is

00:41:15.000 | the little kid in the corner.

00:41:17.000 | There's

00:41:19.000 | sound but no words, right?

00:41:21.000 | And at some point he

00:41:23.000 | stops and then the kid just does whatever they want

00:41:25.000 | to do. So watch what he does. He goes over,

00:41:27.000 | he opens the cabinet,

00:41:29.000 | looks inside, then he

00:41:31.000 | steps back and he looks up

00:41:33.000 | at Felix and then looks down,

00:41:35.000 | and then the action is completed.

00:41:37.000 | I want you to watch it one more

00:41:39.000 | time and think about what's got to be going inside

00:41:41.000 | the kid's head to understand this.

00:41:43.000 | So it seems like

00:41:45.000 | what it looks like to us is the kid figured out that this

00:41:47.000 | guy needed help and helped him. And the paper

00:41:49.000 | is full of many other situations like this.

00:41:51.000 | This is just one. But the key idea

00:41:53.000 | is that the situation is somewhat novel.

00:41:55.000 | People have seen people holding books and

00:41:57.000 | opening cabinets, but probably it's

00:41:59.000 | very rare to see this kind of situation

00:42:01.000 | exactly. It's different in some important

00:42:03.000 | details from what you might have seen before.

00:42:05.000 | And there's other ones in there that are really truly novel because

00:42:07.000 | they just made up a machine right there.

00:42:09.000 | But somehow he has to understand

00:42:11.000 | causally from the way the guy

00:42:13.000 | is banging the books against the thing.

00:42:15.000 | It's sort of both a symbol, but it's also

00:42:17.000 | somehow he's got to understand what he

00:42:19.000 | can do and what he can't do, and then what

00:42:21.000 | the kid can do to help.

00:42:23.000 | I'll show this again, but really just

00:42:25.000 | watch. The main part I want you to see

00:42:27.000 | is

00:42:29.000 | I'll just

00:42:31.000 | sort of skip ahead.

00:42:33.000 | So watch this part

00:42:35.000 | here. Let's say I'll just jump.

00:42:37.000 | Right now he's about

00:42:39.000 | to look up. He looks up and makes

00:42:41.000 | eye contact, and then his eyes look down.

00:42:43.000 | So again,

00:42:45.000 | he looks up,

00:42:47.000 | he looks up, and then a saccade, a

00:42:49.000 | sudden rapid eye movement down, down

00:42:51.000 | to his hands, up, down.

00:42:53.000 | Again, that's this brain OS

00:42:55.000 | in action. He's making one glance,

00:42:57.000 | small glance, at

00:42:59.000 | the big guy's eyes to make

00:43:01.000 | eye contact, to see, to get a signal,

00:43:03.000 | did I understand what you

00:43:05.000 | wanted, and did you register

00:43:07.000 | that joint attention. And then

00:43:09.000 | he makes a prediction about what the guy's going to do,

00:43:11.000 | so he looks right down. He doesn't just

00:43:13.000 | look around randomly. He looks right down

00:43:15.000 | to the guy's hands to track the

00:43:17.000 | action that he expects to see happening.

00:43:19.000 | If I did the right thing to help you, then I expect

00:43:21.000 | you're going to put the books there.

00:43:23.000 | So you can see these things happening, and we

00:43:25.000 | want to know what's going on inside the mind that guides

00:43:27.000 | all of that.

00:43:29.000 | So that's this sort of big scientific agenda

00:43:31.000 | that we're working on over the next few years,

00:43:33.000 | where we think some kind of human

00:43:35.000 | understanding of human

00:43:37.000 | intelligence in scientific terms could

00:43:39.000 | lead to all sorts of AI chaos.

00:43:41.000 | In particular, suppose we could build

00:43:43.000 | a robot that could do what this kid and many other

00:43:45.000 | kids in these experiments do, to say, "Help you

00:43:47.000 | out around the house without having to be programmed

00:43:49.000 | or even really instructed, just to kind of

00:43:51.000 | get a sense. Oh yeah, you need a hand with that?

00:43:53.000 | Sure, let me help you out."

00:43:55.000 | Even 18-month-olds will do that. Sometimes

00:43:57.000 | not very reliably or effectively. Sometimes

00:43:59.000 | they'll try to help and really do the opposite.

00:44:01.000 | But imagine if you could

00:44:03.000 | take the flexible

00:44:05.000 | understanding of humans' actions,

00:44:07.000 | goals, and so on, and make those

00:44:09.000 | reliable engineering technology. That would be

00:44:11.000 | very useful. And it would also be

00:44:13.000 | related to, say, machines that you could actually

00:44:15.000 | start to talk to and trust in some

00:44:17.000 | ways, that shared understanding.

00:44:19.000 | So how are we going to do this? Well, let me spend

00:44:21.000 | the rest of the time talking about how we try

00:44:23.000 | to do this. Some of the

00:44:25.000 | technology that we're building both in our

00:44:27.000 | group and more broadly to try to make

00:44:29.000 | these kinds of architectures real.

00:44:31.000 | And I'll talk about two or

00:44:33.000 | three technical ideas. Again, not in any

00:44:35.000 | detail. One

00:44:37.000 | is the idea of a probabilistic program.

00:44:39.000 | So this is a kind of

00:44:41.000 | a... think of it as

00:44:43.000 | a computational

00:44:45.000 | abstraction that we can use to capture

00:44:47.000 | the common sense knowledge of this core

00:44:49.000 | cognition. So when I say we have an intuitive

00:44:51.000 | understanding of physical objects and people's goals,

00:44:53.000 | how do I build a model

00:44:55.000 | of that model you have in the head?

00:44:57.000 | Probabilistic programs, a little bit more technically,

00:44:59.000 | are... one way to understand

00:45:01.000 | them is as a generalization of Bayesian

00:45:03.000 | networks or other kinds of directed graphical

00:45:05.000 | models, if you know those.

00:45:07.000 | But where instead of defining a probability

00:45:09.000 | model on a graph,

00:45:11.000 | you define it on a program.

00:45:13.000 | And thereby

00:45:15.000 | have access to a much more

00:45:17.000 | expressive toolkit of knowledge representation.

00:45:19.000 | So data structures, other kinds

00:45:21.000 | of algorithmic tools for representing knowledge.

00:45:23.000 | But you still have access

00:45:25.000 | to the ability to do probabilistic inference,

00:45:27.000 | like in a graphical model,

00:45:29.000 | but also causal inference in a

00:45:31.000 | directed graphical model. So for those of you

00:45:33.000 | who know about graphical models, that might make some

00:45:35.000 | sense to you. But just more broadly, what this is,

00:45:37.000 | think of this as a toolkit that allows

00:45:39.000 | us to combine several of the best

00:45:41.000 | ideas, not just of the recent

00:45:43.000 | deep learning era, but over... if you look back over

00:45:45.000 | the whole scope of AI as well as

00:45:47.000 | cognitive science, I think there's three or

00:45:49.000 | four ideas, and

00:45:51.000 | more, but definitely like three ideas we can really

00:45:53.000 | put up there that have proven

00:45:55.000 | their worth and have risen

00:45:57.000 | and fallen in terms of... each of these had

00:45:59.000 | ideas when the mainstream of the field thought

00:46:01.000 | this was totally the way to go and every other idea

00:46:03.000 | was obviously a waste of time.

00:46:05.000 | And also had its time when

00:46:07.000 | many people thought it was a waste of time.

00:46:09.000 | And these three big ideas, I would say,

00:46:11.000 | are first of all the idea of symbolic

00:46:13.000 | representation or symbolic languages

00:46:15.000 | for knowledge representation, probabilistic

00:46:17.000 | inference in generative models

00:46:19.000 | to capture uncertainty, ambiguity,

00:46:21.000 | learning from sparse data, and in their

00:46:23.000 | hierarchical setting, learning to learn.

00:46:25.000 | And then, of course,

00:46:27.000 | the recent developments with neural

00:46:29.000 | inspired architectures for pattern recognition.

00:46:31.000 | Each of these

00:46:33.000 | things, each of these ideas, symbolic

00:46:35.000 | languages, probabilistic inference, and

00:46:37.000 | neural networks, has some distinctive strengths

00:46:39.000 | that are real weak points of the other

00:46:41.000 | approaches. So to take one example that I

00:46:43.000 | haven't really talked about here,

00:46:45.000 | people in the... but you mentioned

00:46:47.000 | as an outstanding challenge for neural networks,

00:46:49.000 | transfer learning, or learning

00:46:51.000 | to take knowledge across a number of previous

00:46:53.000 | tasks to transfer to others. This is a real challenge

00:46:55.000 | and has always been a challenge in a neural net.

00:46:57.000 | But it's something that's addressed

00:46:59.000 | very naturally and very scalably

00:47:01.000 | in, for example, a hierarchical Bayesian model.

00:47:03.000 | And if you look at some of the recent attempts,

00:47:05.000 | really interesting attempts within the

00:47:07.000 | deep learning world to try to get kinds of transfer

00:47:09.000 | learning and learning to learn, they're really cool.

00:47:11.000 | But many of them are

00:47:13.000 | in some ways kind of reinventing

00:47:15.000 | within a neural network paradigm, ideas

00:47:17.000 | that people, maybe just 10 or 15

00:47:19.000 | years ago, developed in very sophisticated

00:47:21.000 | ways in, let's say, hierarchical Bayesian

00:47:23.000 | models. And a lot of attempts

00:47:25.000 | to get sort of symbolic

00:47:27.000 | algorithm-like behavior in neural networks

00:47:29.000 | again, are really, you know,

00:47:31.000 | they're very small steps towards something

00:47:33.000 | which is a very mature technology

00:47:35.000 | in computer systems and programming languages.

00:47:37.000 | Probabilistic programs,

00:47:39.000 | I'll just sort of advertise mostly,

00:47:41.000 | are a way to combine the strengths

00:47:43.000 | of all of these approaches, to have

00:47:45.000 | knowledge representations which are as expressive

00:47:47.000 | as anything that anybody ever did in the symbolic

00:47:49.000 | paradigm, that are as flexible

00:47:51.000 | at dealing with uncertainty and sparse

00:47:53.000 | data as anything in the probabilistic paradigm,

00:47:55.000 | but that also can support pattern

00:47:57.000 | recognition tools to be

00:47:59.000 | able to, for example, do very fast,

00:48:01.000 | efficient inference in very complex

00:48:03.000 | scenarios. And there's a number of

00:48:05.000 | -- that's the kind of conceptual framework.

00:48:07.000 | There's a number of actually implemented tools.

00:48:09.000 | I point to here on the slide

00:48:11.000 | a number of probabilistic programming languages

00:48:13.000 | which you can go explore.

00:48:15.000 | For example, there's one that was developed

00:48:17.000 | in our group a few years ago, almost 10 years ago

00:48:19.000 | now, called Church, which was the antecedent

00:48:21.000 | of some of these other languages built on a

00:48:23.000 | functional programming core. So Church is a probabilistic

00:48:25.000 | programming language built on the lambda

00:48:27.000 | calculus, or really in Lisp, basically.

00:48:29.000 | But there are many other

00:48:31.000 | more modern tools, especially

00:48:33.000 | if you are interested in neural networks.

00:48:35.000 | There are tools like, for example, Pyro

00:48:37.000 | or ProbTorch or BayesFlow

00:48:39.000 | that try to combine all

00:48:41.000 | these ideas in a -- or for example,

00:48:43.000 | in here, which is a project of Vakash Mansinghka's

00:48:45.000 | Probabilistic Computing Group.

00:48:47.000 | These are all things which are

00:48:49.000 | just in the very beginning stages, very,

00:48:51.000 | very alpha. But you can find

00:48:53.000 | out more about them online or by writing

00:48:55.000 | to their creators. And I think this is

00:48:57.000 | a very exciting place where

00:48:59.000 | the convergence of a number of different AI

00:49:01.000 | tools are happening.

00:49:03.000 | And this will be absolutely

00:49:05.000 | necessary for making the kind of architecture that

00:49:07.000 | I'm talking about work.

00:49:09.000 | Another key idea, which we've been building

00:49:11.000 | on in our lab, and I

00:49:13.000 | think, again, many people are using some version

00:49:15.000 | of this idea, but maybe a little bit different

00:49:17.000 | from the way we're doing it,

00:49:19.000 | is -- well, the version of this idea

00:49:21.000 | that I like to talk about is what I call the game

00:49:23.000 | engine in the head. So this is the idea

00:49:25.000 | that it's really what the

00:49:27.000 | programs are about. When I talk about probabilistic

00:49:29.000 | programs, I haven't said anything about what

00:49:31.000 | kind of programs we're using. We're just basically --

00:49:33.000 | these probabilistic programming languages at their best

00:49:35.000 | and Church, the language that

00:49:37.000 | was developed by Noah Goodman and Vakash

00:49:39.000 | and others and Dan Roy and our group some

00:49:41.000 | 10 years ago, was intended to be a

00:49:43.000 | Turing-complete probabilistic programming

00:49:45.000 | language. So any probability model

00:49:47.000 | that was computable or for whose

00:49:49.000 | inferences -- conditional inferences -- are computable,

00:49:51.000 | you could represent in these languages.

00:49:53.000 | But that leaves completely open

00:49:55.000 | what I'm actually going to -- what

00:49:57.000 | kind of program I'm going to write to model

00:49:59.000 | the world. And I've been very

00:50:01.000 | inspired in the last few years by

00:50:03.000 | thinking about the kinds of programs that are in modern

00:50:05.000 | video game engines.

00:50:07.000 | So again, probably most of you are familiar with these,

00:50:09.000 | but if you're -- and increasingly they're playing

00:50:11.000 | a role in all sorts of ways in AI. But these are

00:50:13.000 | tools that were developed by the video game industry

00:50:15.000 | to allow a game designer

00:50:17.000 | to make a new game

00:50:19.000 | without having to do most of --

00:50:21.000 | in some sense, most of the hard technical work

00:50:23.000 | from scratch, but rather to focus

00:50:25.000 | on the characters, the world,

00:50:27.000 | the story, the things that are more

00:50:29.000 | interesting for designing a

00:50:31.000 | novel game. In particular,

00:50:33.000 | if we want a player to explore some

00:50:35.000 | new three-dimensional world,

00:50:37.000 | but to have them be able to interact with the world in real

00:50:39.000 | time and to render nice-looking

00:50:41.000 | graphics in real time

00:50:43.000 | in an interactive way as the player moves around

00:50:45.000 | and explores the world. Or if you want to

00:50:47.000 | populate the world with non-player characters that

00:50:49.000 | will behave in an even vaguely intelligent way.

00:50:51.000 | Okay? Game engines give you

00:50:53.000 | tools for doing all of this without having to write

00:50:55.000 | all of graphics from scratch or all

00:50:57.000 | of physics -- the rules of physics

00:50:59.000 | from scratch. So what are called

00:51:01.000 | game physics engines, and in some

00:51:03.000 | sense are a set of principles,

00:51:05.000 | but also hacks from Newtonian mechanics

00:51:07.000 | and other areas of physics that allow you to

00:51:09.000 | simulate plausible-looking physical

00:51:11.000 | interactions in very complex worlds

00:51:13.000 | very approximately, but

00:51:15.000 | very fast. There's also what's called

00:51:17.000 | game AI, which are basically very simple

00:51:19.000 | planning models. So let's say I want to

00:51:21.000 | have an AI in the game that is

00:51:23.000 | like a guard that guards a base, and a

00:51:25.000 | player is going to attack this base. So back in

00:51:27.000 | the old Atari days, like when I was a kid,

00:51:29.000 | the guards would just be like random

00:51:31.000 | things that would fire missiles kind of randomly

00:51:33.000 | in random directions at random times, right?

00:51:35.000 | But let's say you want a guard to be a little intelligent,

00:51:37.000 | so to actually look around and

00:51:39.000 | "Oh, and I see the player," and then to actually start

00:51:41.000 | shooting at you and to even maybe pursue you.

00:51:43.000 | So that requires putting a little AI in

00:51:45.000 | the game, and you do that by having

00:51:47.000 | basically simple agent models in the game.

00:51:49.000 | So what we think,

00:51:51.000 | and some of you might think this is crazy, and some

00:51:53.000 | of you might think this is a very natural idea

00:51:55.000 | -- I get both kinds of reactions -- what

00:51:57.000 | we think is that these tools of

00:51:59.000 | fast approximate renderers,

00:52:01.000 | physics engines, and sort of very simple

00:52:03.000 | kinds of AI planning are an

00:52:05.000 | interesting first approximation

00:52:07.000 | to the kinds of common sense knowledge

00:52:09.000 | representations that evolution has built into

00:52:11.000 | our brains. So when we talk about the

00:52:13.000 | cognitive core, or how

00:52:15.000 | do babies start,

00:52:17.000 | ways in which a baby's brain isn't

00:52:19.000 | a blank slate, one interesting idea

00:52:21.000 | is that it starts with something like

00:52:23.000 | these tools, and then wrapped

00:52:25.000 | inside a framework for probabilistic inference

00:52:27.000 | -- that's what we mean by probabilistic programs --

00:52:29.000 | that can support many activities

00:52:31.000 | of common sense perception and thinking.

00:52:33.000 | So I'll just give you one example

00:52:35.000 | of what we call this intuitive physics

00:52:37.000 | engine. So this is

00:52:39.000 | work that we did in our groups, that Pete Battaglia

00:52:41.000 | and Jess Hamrick started this work

00:52:43.000 | about five years ago now,

00:52:45.000 | where we showed

00:52:47.000 | people, in some sense,

00:52:49.000 | and this is also an illustration of a kind of

00:52:51.000 | experiment that you might do. When I keep talking

00:52:53.000 | about science, like I'll show you now a couple of experiments.

00:52:55.000 | So we would show people

00:52:57.000 | simple physical scenes, like these blocks

00:52:59.000 | world scenes, and ask them to make a number of judgments.

00:53:01.000 | And the model we built does

00:53:03.000 | basically a little bit of probabilistic inference

00:53:05.000 | in a game-style physics engine.

00:53:07.000 | It perceives the physical state and

00:53:09.000 | imagines a few different possible ways

00:53:11.000 | the world could go over the next one or two

00:53:13.000 | seconds, to answer questions like

00:53:15.000 | "Will the stack of blocks fall?" or

00:53:17.000 | "If they fall, how far will they fall?" or "Which

00:53:19.000 | way will they fall?" or "What would happen if

00:53:21.000 | say, one color

00:53:23.000 | of blocks or one material, like the green

00:53:25.000 | stuff, is ten times heavier than the grey

00:53:27.000 | stuff?" or vice versa. "How will that change the

00:53:29.000 | direction of fall?" or "Look at those red and

00:53:31.000 | yellow blocks, some of

00:53:33.000 | which look like they should be falling, but aren't."

00:53:35.000 | So, why? Can you infer

00:53:37.000 | from the fact that they're not falling

00:53:39.000 | that one color block is much heavier than the

00:53:41.000 | other? Or let me show you a sort

00:53:43.000 | of a slightly weird task.

00:53:45.000 | It's like other behavioral experiments.

00:53:47.000 | Sometimes we do weird things

00:53:49.000 | so that we can test ways in which you use

00:53:51.000 | your knowledge that you didn't just

00:53:53.000 | learn from pattern recognition, but

00:53:55.000 | use it to do new kinds of tasks that you'd

00:53:57.000 | never seen before. So here's a task

00:53:59.000 | which, you know, many of you have maybe

00:54:01.000 | seen me talk about these things, so you might have seen

00:54:03.000 | this task, but probably only if you saw me give a talk

00:54:05.000 | around here before. We call this the

00:54:07.000 | red-yellow task, and again, we'll make this one interactive.

00:54:09.000 | So imagine

00:54:11.000 | that the blocks on the table are knocked

00:54:13.000 | hard enough to bump, the table's

00:54:15.000 | bumped hard enough to knock some of the blocks

00:54:17.000 | onto the floor. So you tell me, "Is it more

00:54:19.000 | likely to be red blocks or yellow blocks?" What do you say?

00:54:21.000 | Red. Okay, good.

00:54:23.000 | How about here?

00:54:25.000 | Yellow. Good.

00:54:27.000 | How about here?

00:54:29.000 | Uh-huh. Here?

00:54:31.000 | Here?

00:54:33.000 | Okay.

00:54:35.000 | Here?

00:54:37.000 | Here?

00:54:39.000 | Okay.

00:54:41.000 | So you

00:54:43.000 | just experience for yourself what it's like to

00:54:45.000 | be a subject in one of these experiments. We just

00:54:47.000 | did the experiment here. The data's all captured on

00:54:49.000 | video, sort of, right? Okay.

00:54:51.000 | You can see that

00:54:53.000 | sometimes people were very quick, other times

00:54:55.000 | people were slower. Sometimes there was a lot of

00:54:57.000 | consensus, sometimes there was a little bit less consensus.

00:54:59.000 | Right? That reflects uncertainty.

00:55:01.000 | So again, there's a long history of studying this

00:55:03.000 | scientifically,

00:55:05.000 | that, you know, but you can see

00:55:07.000 | the probabilistic

00:55:09.000 | inference at work. Probabilistic inference over

00:55:11.000 | what? Well, I would say one way to

00:55:13.000 | describe it is over one or a

00:55:15.000 | few short, low-precision simulations

00:55:17.000 | of the physics of these scenes.

00:55:19.000 | So here is what I mean by this. I'm going

00:55:21.000 | to show you a video of a game

00:55:23.000 | engine reconstruction of one of these scenes

00:55:25.000 | that simulates a small bump. So here's

00:55:27.000 | a small bump. Here's that same

00:55:29.000 | scene with the big bump. Okay. Now

00:55:31.000 | notice that at the micro level, different

00:55:33.000 | things happen. But at the cognitive

00:55:35.000 | or macro level that matters for common

00:55:37.000 | sense reasoning, the same thing happened. Namely, all

00:55:39.000 | the yellow blocks went over onto one side of the table

00:55:41.000 | and few or none of the red blocks did.

00:55:43.000 | So it didn't matter which of those simulations

00:55:45.000 | you ran in your head. You'd get the same answer in this

00:55:47.000 | case. This is one that's very easy and high

00:55:49.000 | confidence and quick. Also,

00:55:51.000 | you didn't have to run the simulation for very long.

00:55:53.000 | You only have to run it for a few time steps

00:55:55.000 | like that to see what's going to happen, or similarly here.

00:55:57.000 | You only have to run it for a few time steps.

00:55:59.000 | And it doesn't have to be even very accurate.

00:56:01.000 | Even a fair amount of imprecision

00:56:03.000 | will give you basically the same answer at the

00:56:05.000 | level that matters for common sense. So that's

00:56:07.000 | the kind of thing our model does. It runs a few

00:56:09.000 | low-precision simulations for a few

00:56:11.000 | time steps. But if you take the average of what

00:56:13.000 | happens there and you compare that with people's

00:56:15.000 | judgments, you get results like what I show

00:56:17.000 | you here. This scatter plot shows

00:56:19.000 | on the y-axis the average judgments

00:56:21.000 | of people. On the x-axis, the average judgments

00:56:23.000 | of this model. And it does a pretty good job. It's not

00:56:25.000 | perfect, but the model basically captures

00:56:27.000 | people's graded sense of what's going

00:56:29.000 | on in this scene and many of these others.

00:56:31.000 | Okay?

00:56:33.000 | And it doesn't do it with any learning.

00:56:35.000 | But I'll come back to that in a second. It just does it by

00:56:37.000 | probabilistic reasoning over a game physics simulation.

00:56:39.000 | Now we can use,

00:56:41.000 | and we have used, the same kind of technology

00:56:43.000 | to capture in very simple

00:56:45.000 | forms, really just proofs of concept at this

00:56:47.000 | point, the kind of common sense physical

00:56:49.000 | scene understanding in a child playing

00:56:51.000 | with blocks or other objects

00:56:53.000 | or in what might go on in a young child's

00:56:55.000 | understanding of other people's actions, what we call

00:56:57.000 | the intuitive psychology engine, where

00:56:59.000 | now the probabilistic programs are defined

00:57:01.000 | over these kind of very simple planning

00:57:03.000 | and perception programs. And I won't go

00:57:05.000 | into any details. I'll just point to a couple

00:57:07.000 | of papers that my group played a very

00:57:09.000 | small role in, but we provided some models

00:57:11.000 | which together with some infant researchers,

00:57:13.000 | people working on, both of these are experiments

00:57:15.000 | that were done with

00:57:17.000 | 10 or 12 month infants, so younger

00:57:19.000 | than even some of the babies I showed you before,

00:57:21.000 | but basically like that youngest baby, the one

00:57:23.000 | with the cat. Here's an example

00:57:25.000 | of showing simple physical scenes.

00:57:27.000 | These are moving objects to

00:57:29.000 | 12 month olds where they saw

00:57:31.000 | a few objects bouncing around inside a

00:57:33.000 | gumball machine and after some point

00:57:35.000 | in time, the scene gets occluded.

00:57:37.000 | You'll see the scene is occluded and then after another

00:57:39.000 | period of time, one of the objects will appear at the

00:57:41.000 | bottom. And the question is, is that

00:57:43.000 | the object you expected to see or not?

00:57:45.000 | Is it expected or surprising? The standard

00:57:47.000 | way you study what infants know is by

00:57:49.000 | what's called looking time methods.

00:57:51.000 | Just like an adult, if I show you something that's surprising

00:57:53.000 | you might look longer.

00:57:55.000 | If you're bored, you'll look away.

00:57:57.000 | So you can do that same

00:57:59.000 | kind of thing with infants and by

00:58:01.000 | measuring how long they look at a scene,

00:58:03.000 | you can measure whether you've shown them something surprising

00:58:05.000 | or not.

00:58:07.000 | There are literally hundreds of studies, if not more,

00:58:09.000 | using looking time measures

00:58:11.000 | to study what infants know.

00:58:13.000 | But only with this paper that we

00:58:15.000 | published a few years ago, did we have

00:58:17.000 | a quantitative model where we were able to show

00:58:19.000 | a relation between inverse probability

00:58:21.000 | in this case and surprise. So things which were objectively

00:58:23.000 | lower probability under one

00:58:25.000 | of these probabilistic physics simulations across

00:58:27.000 | a number of different manipulations of

00:58:29.000 | how fast the objects were, where they were

00:58:31.000 | when the scene was occluded, how long the delay was,

00:58:33.000 | various physically relevant variables. How many

00:58:35.000 | objects there were of one type or another.

00:58:37.000 | Infants' expectations connected

00:58:39.000 | with this model. Or another paper that

00:58:41.000 | we published, that one was

00:58:43.000 | done, the experiments there were done by

00:58:45.000 | Arno Tegelus in Luca Bonatti's lab.

00:58:47.000 | Here is a study that was done just

00:58:49.000 | recently by Sherry Liu in Liz

00:58:51.000 | Spelke's lab, they're at Harvard, but they're

00:58:53.000 | partners with us in CBMM,

00:58:55.000 | which was about infants' understanding of goals.

00:58:57.000 | So this is more like, again, understanding of agents

00:58:59.000 | in intuitive psychology, where in, again,

00:59:01.000 | in very simple cartoon scenes,

00:59:03.000 | you show an infant, an agent

00:59:05.000 | that seems to be doing something, like an animated

00:59:07.000 | cartoon character, but it jumps over

00:59:09.000 | a wall, or it rolls up a hill,

00:59:11.000 | or it jumps over a gap. And

00:59:13.000 | the question is, basically, how

00:59:15.000 | much does the agent want the goal that it seems

00:59:17.000 | to be trying to achieve? And what this study

00:59:19.000 | showed, and the models here were

00:59:21.000 | done by Tomer Ullman, was that infants

00:59:23.000 | appeared to be sensitive to the

00:59:25.000 | physical work done by the agent.

00:59:27.000 | The more work the agent did, in the sense

00:59:29.000 | of the integral of force

00:59:31.000 | applied over a path, the more

00:59:33.000 | the infants thought the agent

00:59:35.000 | wanted the goal.

00:59:37.000 | We think of this as representing what we sometimes

00:59:39.000 | call the naive utility calculus.

00:59:41.000 | So the idea that there's a basic calculus

00:59:43.000 | of cost and benefit,

00:59:45.000 | you know, we take actions which

00:59:47.000 | are a little bit costly to achieve goal states

00:59:49.000 | which give us some reward. That's the

00:59:51.000 | most basic way, the oldest way, to think about

00:59:53.000 | rational, intentional action. And it

00:59:55.000 | seems that even 10-month-olds understand some

00:59:57.000 | version of that, where the cost can be measured

00:59:59.000 | in physical terms.

01:00:01.000 | I see I'm running a little bit behind

01:00:03.000 | on time, and I wanted to

01:00:05.000 | leave some time for discussion. So I'll just go

01:00:07.000 | very quickly through a couple of other things,

01:00:09.000 | and happy to stay around at the end

01:00:11.000 | for discussion.

01:00:13.000 | What I

01:00:15.000 | showed you here was the science. Where does the

01:00:17.000 | engineering go? So one thing you can

01:00:19.000 | do with this is, say, build

01:00:21.000 | a machine system that can look not at a little

01:00:23.000 | animated cartoon like these baby experiments,

01:00:25.000 | but a real person doing something. And again,

01:00:27.000 | combine physical cost and

01:00:29.000 | constraints of actions with some understanding

01:00:31.000 | of the agent's utilities. That's the

01:00:33.000 | math of planning

01:00:35.000 | to figure out what they wanted.

01:00:37.000 | So look in this scene here, and

01:00:39.000 | see if you can judge which object

01:00:41.000 | the woman is reaching for. So you can see

01:00:43.000 | there's a grid of

01:00:45.000 | 4x4 objects. There's 16 objects

01:00:47.000 | here, and she's going to be reaching for one of them.

01:00:49.000 | It's going to play in slow motion,

01:00:51.000 | but raise your hand when you know which one she's reaching

01:00:53.000 | for. So just watch and raise your hand when

01:00:55.000 | you know which one she wants.

01:00:57.000 | So most of the hands are up by now.

01:01:02.000 | And notice, I was looking at your

01:01:04.000 | hands, not here, but what happened is

01:01:06.000 | most of the hands were up about the time

01:01:08.000 | when that dashed

01:01:10.000 | line shot up.

01:01:12.000 | That's not human data. You

01:01:14.000 | provided the data. This is our model. So our

01:01:16.000 | model is predicting, more or less, when

01:01:18.000 | you're able to say what her goal was.

01:01:20.000 | It's well before she actually touched the object.

01:01:22.000 | How does the model work? Again, I'll skip the details,

01:01:24.000 | but it does the same kind of

01:01:26.000 | thing that our models of those infants

01:01:28.000 | did. Namely, but in this

01:01:30.000 | case it does it with a full-body model from

01:01:32.000 | robotics. So we use what's called the MuJoCo physics

01:01:34.000 | engine, which is a standard tool

01:01:36.000 | in robotics for planning physically

01:01:38.000 | efficient reaches of, say, a humanoid robot.

01:01:40.000 | And we say, we can give

01:01:42.000 | this planner program

01:01:44.000 | a goal object as input. We can give it each of

01:01:46.000 | the possible goal objects as input and say,

01:01:48.000 | "Plan the most physically efficient action,"

01:01:50.000 | so the one that uses the least energy to get to

01:01:52.000 | that object. And then we can do a Bayesian

01:01:54.000 | inference. This is the probabilistic inference part.

01:01:56.000 | The program is the MuJoCo

01:01:58.000 | planner. But then we can

01:02:00.000 | say, "I want to do Bayesian inference

01:02:02.000 | to work backwards from what I observed,

01:02:04.000 | which was the action, to the input to that

01:02:06.000 | program. What goal was provided as

01:02:08.000 | input to the planner?" And here you can see

01:02:10.000 | the full array of 4x4

01:02:12.000 | possible inputs, and those bars that are

01:02:14.000 | moving up and down, that's the Bayesian posterior

01:02:16.000 | probability of how likely each

01:02:18.000 | of those was to be the goal. And what

01:02:20.000 | you can see is it converges on the right answer,

01:02:22.000 | at least, well, it turns out to be the ground truth

01:02:24.000 | right answer, but it's also the right answer according to what people

01:02:26.000 | think, with about the same kind of

01:02:28.000 | data that people took.

01:02:30.000 | Now you might say, "Well, okay, sure,

01:02:32.000 | if I just wanted to build a system that could detect

01:02:34.000 | what somebody was reaching for, I could generate

01:02:36.000 | a training data set of this sort of

01:02:38.000 | scene and train something up to

01:02:40.000 | analyze patterns of motion." But again,

01:02:42.000 | because the engine in your head actually

01:02:44.000 | does something we think more like this,

01:02:46.000 | it does what we call inverse planning over a

01:02:48.000 | physics model, it can apply to much more

01:02:50.000 | interesting scenes that you haven't really seen

01:02:52.000 | much of before. So take the scene on the left,

01:02:54.000 | where again you see somebody reaching

01:02:56.000 | for one of a 4x4 array of objects,

01:02:58.000 | but what you see is a strange kind of reach.

01:03:00.000 | Can you see why he's doing a strange reach?

01:03:02.000 | Up there, it's a little

01:03:04.000 | small, but you can see that he's

01:03:06.000 | reaching over something, right? It's

01:03:08.000 | actually a pane of glass, right? Do you see that?

01:03:10.000 | And then there's this other guy

01:03:12.000 | helping him, who sees what

01:03:14.000 | he wants and hands him the thing he wants.

01:03:16.000 | So how does the guy

01:03:18.000 | in the foreground see the other guy's goal?

01:03:20.000 | How does he infer his goal

01:03:22.000 | and know how to help him? And then how do we look

01:03:24.000 | at the two of them and figure out who's trying to help

01:03:26.000 | who? Or that in a scene like this one

01:03:28.000 | here, that it's not somebody trying

01:03:30.000 | to help somebody, but rather the opposite.

01:03:32.000 | So here's a model on the left

01:03:34.000 | of how that might work, and we think this is the

01:03:36.000 | kind of model needed to tackle this sort of challenge

01:03:38.000 | here. Basically, it's a model

01:03:40.000 | -- we take this model of

01:03:42.000 | planning, sort of maximal expected utility

01:03:44.000 | planning, which you can run backwards,

01:03:46.000 | but then we recursively nest these

01:03:48.000 | models inside each other. So we say, an agent

01:03:50.000 | is helping another agent. If this agent

01:03:52.000 | is acting, apparently, to us,

01:03:54.000 | seems to be maximizing an expected utility,

01:03:56.000 | that's a positive

01:03:58.000 | function of that agent's expectation

01:04:00.000 | about another agent's expected utility, and that's what it

01:04:02.000 | means to be a helper. Hindering

01:04:04.000 | is sort of the opposite, if one seems to be trying

01:04:06.000 | to lower somebody else's utility.

01:04:08.000 | And we've used these same

01:04:10.000 | kind of models to also describe

01:04:12.000 | infants' understanding of helping and hindering

01:04:14.000 | in a range of scenes. I'll just

01:04:16.000 | say one last word about learning,

01:04:18.000 | because everybody wants to know about learning, and

01:04:20.000 | the key thing here, and it's

01:04:22.000 | definitely part of any picture of AGI,

01:04:24.000 | but the thought I want to leave

01:04:26.000 | you on is really about what learning is about.

01:04:28.000 | It'll be just a few more slides, and then I'll stop, I promise.

01:04:30.000 | None of the models

01:04:32.000 | I showed you so far really did any learning.

01:04:34.000 | They certainly didn't do any task-specific learning.

01:04:36.000 | We set up a probabilistic program

01:04:38.000 | and then we let it do inference. Now that's

01:04:40.000 | not to say that we don't think people learn to do these

01:04:42.000 | things. We do. But the real learning

01:04:44.000 | goes on when you're much younger.

01:04:46.000 | Everything I showed you in basic form

01:04:48.000 | even a one-year-old baby can do.

01:04:50.000 | The basic learning goes on

01:04:52.000 | to support these kinds of abilities.

01:04:54.000 | Not that there isn't learning beyond one year, but

01:04:56.000 | the basic way you learn to, say, solve these physics

01:04:58.000 | problems is what goes on

01:05:00.000 | in the brain of a child

01:05:02.000 | between zero and twelve months. So this

01:05:04.000 | is just an example of some phenomena

01:05:06.000 | that come from the literature on infant cognitive

01:05:08.000 | development. These are very rough timelines.

01:05:10.000 | You can take pictures of this if you like.

01:05:12.000 | This is always a popular slide because it

01:05:14.000 | really is quite inspiring, I think, and I can give you

01:05:16.000 | lots of literature pointers, but I'm

01:05:18.000 | summarizing in very broad strokes with big

01:05:20.000 | error bars what we've learned

01:05:22.000 | in the field of infant cognitive development

01:05:24.000 | about when and how

01:05:26.000 | kids seem to at least come to

01:05:28.000 | certain understanding of basic aspects of physics.

01:05:30.000 | So if you really

01:05:32.000 | want to study how people learn to be intelligent,

01:05:34.000 | a lot of what you have to study are kids

01:05:36.000 | at this age. You have to study what's already

01:05:38.000 | in their brain at zero months and

01:05:40.000 | what they learn and how they learn between four,

01:05:42.000 | six, eight, ten, twelve, and so on, and

01:05:44.000 | on up beyond that.

01:05:46.000 | Now, effectively

01:05:48.000 | what that amounts to, we think, is

01:05:50.000 | if what you're learning is something like

01:05:52.000 | let's say an intuitive game

01:05:54.000 | physics engine to capture these basic

01:05:56.000 | abilities, then what we need, if we're going to

01:05:58.000 | try to reverse engineer that, is what

01:06:00.000 | you might think of as a program learning program.

01:06:02.000 | If your knowledge is in the form of a program,

01:06:04.000 | then you have to have programs that build other programs.

01:06:06.000 | This is what I was talking about at the beginning

01:06:08.000 | about learning as building models of the

01:06:10.000 | world. Or ultimately, if you think

01:06:12.000 | what we start off with is something like a game

01:06:14.000 | engine that can play any game,

01:06:16.000 | then what you have to learn is the program of the game

01:06:18.000 | that you're actually playing, or the many different games

01:06:20.000 | that you might be playing over your life. So think

01:06:22.000 | of learning as like programming the game

01:06:24.000 | engine in your head to fit with your experience

01:06:26.000 | and to fit with the possible

01:06:28.000 | actions that you seem like you can take.

01:06:30.000 | Now this is what you could call the hard problem

01:06:32.000 | of learning if you come to learning from, say, neural

01:06:34.000 | networks or other tools in machine learning.

01:06:36.000 | So what makes machine, makes most

01:06:38.000 | of machine learning go right now, and certainly what

01:06:40.000 | makes neural networks so appealing,

01:06:42.000 | is that you can set up basically a big

01:06:44.000 | function approximator that can approximate

01:06:46.000 | many of the functions you might want to do in

01:06:48.000 | a certain application or task, but

01:06:50.000 | in a way that's end-to-end differentiable

01:06:52.000 | and with a meaningful cost function.

01:06:54.000 | So you can have one of these nice optimization

01:06:56.000 | landscapes, you can compute the gradients and basically

01:06:58.000 | just roll downhill until you get to

01:07:00.000 | an optimal solution. But

01:07:02.000 | if you're talking about learning as something like search

01:07:04.000 | in the space of programs, we don't

01:07:06.000 | know how to do anything like that yet. We don't know

01:07:08.000 | how to set this up as any kind of a nice

01:07:10.000 | optimization problem with any notion of smoothness

01:07:12.000 | or gradients. Rather

01:07:14.000 | what we need is, instead of

01:07:16.000 | learning as like rolling downhill effectively,

01:07:18.000 | a process which just, if you're

01:07:20.000 | willing to wait long enough,

01:07:22.000 | some simple algorithm

01:07:24.000 | will take care of. Think of what we

01:07:26.000 | call the idea of learning as programming.

01:07:28.000 | There's a popular metaphor in

01:07:30.000 | cognitive development called the child as scientist,

01:07:32.000 | which emphasizes children

01:07:34.000 | as active theory builders and children's

01:07:36.000 | play as a kind of casual

01:07:38.000 | experimentation. But this is the

01:07:40.000 | algorithmic complement to that, what we call

01:07:42.000 | the child as coder, or around MIT we'll say

01:07:44.000 | the child as hacker. But the rest of the world

01:07:46.000 | if you say child as hacker, they think of

01:07:48.000 | someone who breaks into your email and steals

01:07:50.000 | your credit card numbers. We all know that hacking

01:07:52.000 | is making your code more awesome.

01:07:54.000 | If your

01:07:56.000 | knowledge is some kind of code, or

01:07:58.000 | library of programs, then learning

01:08:00.000 | is all the ways that a child hacks on

01:08:02.000 | their code to make it more awesome.

01:08:04.000 | More awesome can mean more accurate, but it

01:08:06.000 | can also mean faster, more elegant,

01:08:08.000 | more transportable to other

01:08:10.000 | applications or their tasks, more explainable

01:08:12.000 | to others, maybe just more entertaining.

01:08:14.000 | Children have all

01:08:16.000 | of those goals in learning. And the activities

01:08:18.000 | by which they make their code more awesome

01:08:20.000 | also correspond to many

01:08:22.000 | of the activities of coding.

01:08:24.000 | So think about all the ways on a day

01:08:26.000 | to day basis you might make your code more

01:08:28.000 | awesome.

01:08:30.000 | You might have a big library

01:08:32.000 | of existing functions with some parameters

01:08:34.000 | that you can tune on a data set. That's basically

01:08:36.000 | what you do with backprop or stochastic gradient

01:08:38.000 | descent in training a deep learning system.

01:08:40.000 | But think about all the ways in which you might

01:08:42.000 | actually modify the underlying functions. So write

01:08:44.000 | new code, or take old code from

01:08:46.000 | some other thing and map it over here,

01:08:48.000 | or make a whole new library of code, or refactor

01:08:50.000 | your code to some other

01:08:52.000 | basis that will

01:08:54.000 | work more robustly and be more extensible.

01:08:56.000 | Or transpiling, or compiling,

01:08:58.000 | or even just commenting

01:09:00.000 | your code, or asking someone else

01:09:02.000 | for their code. Again, these are all

01:09:04.000 | ways that we make our code more awesome,

01:09:06.000 | and children's learning has analogs to all of

01:09:08.000 | these that we would want to understand

01:09:10.000 | as an engineer from an algorithmic point of view.

01:09:12.000 | So in our group we've been working

01:09:14.000 | on various early steps towards

01:09:16.000 | this. And again, we don't have anything like

01:09:18.000 | program writing programs at

01:09:20.000 | the level of children's learning algorithms.

01:09:22.000 | But one example of something that we did in

01:09:24.000 | our group, which you might not have thought of being about this,

01:09:26.000 | but it's definitely the AI work we did

01:09:28.000 | that got the most attention

01:09:30.000 | in the last couple of years from our group.

01:09:32.000 | We had this paper that was in science, it was actually

01:09:34.000 | on the cover of science, sort of just

01:09:36.000 | hit the market at the right time

01:09:38.000 | if you like, and it got about 100 times more

01:09:40.000 | publicity than anything else I've ever done,

01:09:42.000 | which is partly a testament to the really great

01:09:44.000 | work that Brendan Lake, who was the first author, did

01:09:46.000 | for his PhD here, but much

01:09:48.000 | more so just about the hunger for AI systems

01:09:50.000 | at the time when we published this in 2015.

01:09:52.000 | And we built a machine system

01:09:54.000 | that, the way we described it, was

01:09:56.000 | doing human level concept learning for

01:09:58.000 | simple, very simple visual

01:10:00.000 | concepts, these handwritten characters in many

01:10:02.000 | of the world's alphabets. For those of you who know

01:10:04.000 | the famous MNIST dataset, the dataset

01:10:06.000 | of handwritten digits 0 through 10,

01:10:08.000 | or through 9, sorry, that drove

01:10:10.000 | so much good research in deep learning

01:10:12.000 | and pattern recognition. It did that

01:10:14.000 | not because Jan Lekun, who put that together,

01:10:16.000 | or Jeff Hinton, who did a lot of work on deep learning

01:10:18.000 | with MNIST, they weren't interested

01:10:20.000 | fundamentally in character recognition,

01:10:22.000 | they saw that as a very simple testbed

01:10:24.000 | for developing more general ideas.

01:10:26.000 | And similarly, we did this work on getting

01:10:28.000 | machines to do a kind

01:10:30.000 | of one-shot learning of generative models

01:10:32.000 | also to develop

01:10:34.000 | more general ideas. We saw this as learning

01:10:36.000 | very simple, little mini, probabilistic

01:10:38.000 | programs. In this case, what are those programs?

01:10:40.000 | They're the programs you use to draw a character.

01:10:42.000 | So ask yourself, how can you look at any one

01:10:44.000 | of these characters and see, in a sense,

01:10:46.000 | how somebody might draw it? The way we tested

01:10:48.000 | this in our system was this little

01:10:50.000 | visual Turing test, where we showed

01:10:52.000 | people one character in a novel alphabet

01:10:54.000 | and we said, "Draw another one."

01:10:56.000 | And then we compared nine people, like say, on the left

01:10:58.000 | and nine samples from our machine,

01:11:00.000 | say, on the right, and we said,

01:11:02.000 | we asked other people, "Could you tell which

01:11:04.000 | was the human drawing another example, or imagining

01:11:06.000 | another example, and which was the machine?"

01:11:08.000 | And people couldn't tell. When I said, "One's on the left,

01:11:10.000 | one's on the right," I don't actually remember.

01:11:12.000 | And on different ones, you can see if you can tell. It's very

01:11:14.000 | hard to tell. Can you tell which is, for each

01:11:16.000 | one of these characters, which new set of

01:11:18.000 | examples were drawn by a human versus a machine?

01:11:20.000 | Here's the right answer.

01:11:22.000 | And probably you couldn't tell.

01:11:24.000 | The way we did this was by assembling

01:11:26.000 | a simple kind of program learning program.

01:11:28.000 | So we basically said, when you draw

01:11:30.000 | a character, you're assembling strokes and

01:11:32.000 | substrokes with goals and subgoals

01:11:34.000 | that produce ink on the page.

01:11:36.000 | And when you see a character, you're working backwards

01:11:38.000 | to figure out, what was the program, the most

01:11:40.000 | efficient program that did that? So you're basically

01:11:42.000 | inverting a probabilistic program,

01:11:44.000 | doing Bayesian inference to the program

01:11:46.000 | most likely to have generated what you saw.

01:11:48.000 | This is one small step,

01:11:50.000 | we think, towards being able to learn

01:11:52.000 | programs, to being able to learn something ultimately

01:11:54.000 | like a whole game engine program.

01:11:56.000 | The last thing I'll leave you with is just a pointer

01:11:58.000 | to sort of work in action. So this

01:12:00.000 | is some work being done by a current PhD

01:12:02.000 | student who works partly with me, but also with

01:12:04.000 | Armando Solar-Lizama in CSAIL.

01:12:06.000 | This is Kevin Ellis. It's an example of

01:12:08.000 | what's now, I think, again, an emerging

01:12:10.000 | exciting area in AI, well

01:12:12.000 | beyond anything that we're doing,

01:12:14.000 | is combining techniques from

01:12:16.000 | where Armando comes from, which is the world of

01:12:18.000 | programming languages, not machine learning or

01:12:20.000 | AI, but tools from programming

01:12:22.000 | languages which can be used to automatically

01:12:24.000 | synthesize code, okay, with

01:12:26.000 | the machine learning toolkit, in this case a kind of

01:12:28.000 | Bayesian minimum description length

01:12:30.000 | idea, to be able to make, again,

01:12:32.000 | what is really one small step towards

01:12:34.000 | machines that can learn programs by

01:12:36.000 | basically trying to efficiently find

01:12:38.000 | the shortest, simplest program which can

01:12:40.000 | capture some data set.

01:12:42.000 | So we think by combining these kinds of tools,

01:12:44.000 | in this case, let's say, from Bayesian inference

01:12:46.000 | over programs with a number of tools

01:12:48.000 | that have been developed in other areas

01:12:50.000 | of computer science that don't look anything

01:12:52.000 | or haven't been considered to be machine learning

01:12:54.000 | or AI, like programming languages,

01:12:56.000 | it's one of the many ways that going

01:12:58.000 | forward we're going to be able to build smarter, more

01:13:00.000 | human-like machines. So just

01:13:02.000 | to end then, what I've tried to tell you

01:13:04.000 | here is, first of all,

01:13:06.000 | identify the ways in which human intelligence

01:13:08.000 | goes beyond pattern recognition

01:13:10.000 | to really all these activities of modeling

01:13:12.000 | the world, okay, to give you a sense

01:13:14.000 | of some of the domains where we can start to study this

01:13:16.000 | in common sense scene understanding, for example,

01:13:18.000 | or something

01:13:20.000 | like

01:13:22.000 | one-shot learning, for example, like what we were just doing

01:13:24.000 | there, or learning as programming the engine in your

01:13:26.000 | head, okay, and

01:13:28.000 | to give you a sense of some of the technical

01:13:30.000 | tools, probabilistic programs,

01:13:32.000 | program synthesis, game engines, for

01:13:34.000 | example, as well as a little bit of deep learning

01:13:36.000 | that, bringing together, we're starting

01:13:38.000 | to be able to make these things real.

01:13:40.000 | Now, that's the science

01:13:42.000 | agenda and the reverse engineering agenda,

01:13:44.000 | but think about, for those of you who are interested in technology,

01:13:46.000 | what are the many big AI

01:13:48.000 | frontiers that this opens up?

01:13:50.000 | So the one I'm most excited about

01:13:52.000 | is this idea which I've highlighted

01:13:54.000 | here in our big research agenda. This is the one

01:13:56.000 | I'm most excited about to work on for the,

01:13:58.000 | you know, it could be the rest of my career, honestly,

01:14:00.000 | but it's really what is

01:14:02.000 | the oldest and maybe the best dream

01:14:04.000 | of AI researchers of how to

01:14:06.000 | build a human-like intelligence system,

01:14:08.000 | a real AGI system.

01:14:10.000 | It's the idea that Turing proposed when he proposed

01:14:12.000 | the Turing test, or Marvin Minsky proposed

01:14:14.000 | this at different times in his life, or many people have

01:14:16.000 | proposed this, right, which is to build a system

01:14:18.000 | that grows into intelligence the way a human

01:14:20.000 | does, that starts like a baby and learns like

01:14:22.000 | a child, and I've tried to show you how we're

01:14:24.000 | starting to be able to understand those

01:14:26.000 | things. What a baby's mind starts with,

01:14:28.000 | how children actually learn,

01:14:30.000 | and looking forward, we might

01:14:32.000 | imagine that someday we'll be able to build machines

01:14:34.000 | that can do this. I think we can actually start working

01:14:36.000 | on this right now, and

01:14:38.000 | that's something that we're doing in our group.

01:14:40.000 | So if that kind of thing excites you, then

01:14:42.000 | I encourage you to work on it, maybe even with us,

01:14:44.000 | or if any one of these other activities of human

01:14:46.000 | intelligence excite you, I

01:14:48.000 | think taking the kind of science-based

01:14:50.000 | reverse engineering approach that we're doing

01:14:52.000 | and then trying to put that into

01:14:54.000 | engineering practice,

01:14:56.000 | this is not just a

01:14:58.000 | possible route, but I think it's

01:15:00.000 | quite possibly the most valuable route

01:15:02.000 | that you could work on right now

01:15:04.000 | to try to actually achieve at least some kind

01:15:06.000 | of artificial general intelligence,

01:15:08.000 | especially the kind of intelligence

01:15:10.000 | AI system that's going to live

01:15:12.000 | in a human world and interact with humans.

01:15:14.000 | There's many kinds of AI systems that could live in

01:15:16.000 | worlds of data that none of us can understand or will

01:15:18.000 | ever live in ourselves, but if you want to build

01:15:20.000 | machines that can live in our world and interact with

01:15:22.000 | us the way we are used to interacting

01:15:24.000 | with other people, then I think

01:15:26.000 | this is a route that you should consider.

01:15:28.000 | Thank you.

01:15:30.000 | [Applause]

01:15:38.000 | Hi there.

01:15:40.000 | So, earlier in the talk you expressed some skepticism

01:15:42.000 | about whether or not industry

01:15:44.000 | would get us to understanding human-level intelligence.

01:15:46.000 | It seems that there's a couple of trends

01:15:48.000 | that favour industry. One is that

01:15:50.000 | industry is better than academia at accumulating

01:15:52.000 | resources and ploughing back into

01:15:54.000 | the topic, and it seems at the

01:15:56.000 | moment we've got a bit of brain drain going on

01:15:58.000 | from academia into industry,

01:16:00.000 | and that seems like an ongoing trend.

01:16:02.000 | If you look at something like learning to fly

01:16:04.000 | or learning to fly into space,

01:16:06.000 | then it looks like the story is

01:16:08.000 | one of industry kind of taking over

01:16:10.000 | the field and going

01:16:12.000 | off on its own a little bit.

01:16:14.000 | Academics still have a role,

01:16:16.000 | but industry kind of dominates.

01:16:18.000 | Is industry going to overtake the field, do you think?

01:16:20.000 | Well, that's a really good question, and it's got

01:16:22.000 | several good questions packed into one there.

01:16:24.000 | I didn't mean to say,

01:16:26.000 | this wasn't meant to say, "Go academia,

01:16:28.000 | bad industry."

01:16:30.000 | What I tried to say

01:16:32.000 | was the approaches that are

01:16:34.000 | currently getting the most attention in industry

01:16:36.000 | and that are really, because they're really the most

01:16:38.000 | valuable ones right now for the short term,

01:16:40.000 | any industry is really focused on what it can do,

01:16:42.000 | what are the value propositions

01:16:44.000 | on basically a two-year time scale at most.

01:16:46.000 | If you ask, say, Google researchers to take the most

01:16:48.000 | prominent example, that's pretty much

01:16:50.000 | what they'll all tell you.

01:16:52.000 | Maybe things that might

01:16:54.000 | pay off initially in

01:16:56.000 | two years, but maybe take five years or more to really

01:16:58.000 | develop. But if you can't show that it's going to

01:17:00.000 | do something practical for us in two years

01:17:02.000 | in a way that matters for our bottom line,

01:17:04.000 | then it's not really worth doing.

01:17:06.000 | What I'm

01:17:08.000 | talking about is the technologies which right

01:17:10.000 | now industry sees as meeting

01:17:12.000 | that specification. What I'm

01:17:14.000 | saying is right now I think

01:17:16.000 | that's not

01:17:18.000 | where the route is to something like human-like,

01:17:20.000 | but not the most valuable promising

01:17:22.000 | route to human-like kinds of AI

01:17:24.000 | systems. But I hope that

01:17:26.000 | like in the case as you said,

01:17:28.000 | the basic research that we're doing now will be

01:17:30.000 | successful enough that it will

01:17:32.000 | get the attention of industry when the time is right.

01:17:34.000 | I hope

01:17:36.000 | at some point

01:17:38.000 | at least the engineering

01:17:40.000 | side will have to be done in industry,

01:17:42.000 | not just in academia.

01:17:44.000 | But you're also pointing to issues of like brain

01:17:46.000 | drain and other things like that.

01:17:48.000 | These are real issues confronting our community. I think

01:17:50.000 | everybody knows this and I'm sure this will come up

01:17:52.000 | multiple times here, which is

01:17:54.000 | I think we have to find

01:17:56.000 | ways to, even now, to combine

01:17:58.000 | the best of the ideas, the energy

01:18:00.000 | and the resources of academia and industry

01:18:02.000 | if we want to keep doing

01:18:04.000 | basically something interesting.

01:18:06.000 | If we just want to redefine

01:18:08.000 | AI to be whatever people currently

01:18:10.000 | call AI but scaled up,

01:18:12.000 | then fine, forget about it.

01:18:14.000 | Or if we just want to say, let me

01:18:16.000 | and people like me do what we're doing at

01:18:18.000 | what industry would consider a snail's pace on

01:18:20.000 | toy problems, okay, fine.

01:18:22.000 | But if we want to,

01:18:24.000 | if I want to take what I'm doing to

01:18:26.000 | the level that will really be

01:18:28.000 | paying off the level that industry can

01:18:30.000 | appreciate or just that really has technological

01:18:32.000 | impact on a broad scale,

01:18:34.000 | or I think if industry wants

01:18:36.000 | to take what it's doing and really

01:18:38.000 | build machines that are actually intelligent

01:18:40.000 | or machine learning that actually

01:18:42.000 | learns like a person, then I think we need each other

01:18:44.000 | now and not just at some point in the future.

01:18:46.000 | So this is a general challenge for

01:18:48.000 | MIT and for everywhere and for Google.

01:18:50.000 | We just spent a few days

01:18:52.000 | talking to Google about exactly this issue.

01:18:54.000 | In fact, this was a talk I prepared

01:18:56.000 | partly for that purpose.

01:18:58.000 | We wanted to raise those issues and it's just

01:19:00.000 | really, I don't know what,

01:19:02.000 | rather I can think of some solutions

01:19:04.000 | to the problem

01:19:06.000 | of what you could call brain drain from the academic

01:19:08.000 | point of view or what you could call just

01:19:10.000 | narrowing in into certain local minima in the

01:19:12.000 | industry point of view. But they will require

01:19:14.000 | the leadership of both

01:19:16.000 | academic institutions like MIT and

01:19:18.000 | companies like Google being creative about how they

01:19:20.000 | might work together in ways that are a little bit outside of

01:19:22.000 | their comfort zone. I hope that will start to happen

01:19:24.000 | including at MIT

01:19:26.000 | and at many other universities

01:19:28.000 | and at companies like Google and many others

01:19:30.000 | and I think we need it to happen for the health of

01:19:32.000 | all parties concerned.

01:19:34.000 | - Okay, thank you very much. - Thanks.

01:19:36.000 | - I'm curious about

01:19:38.000 | sort of the premise that you gave

01:19:40.000 | that one of the big gaps

01:19:42.000 | missing at determining intelligence

01:19:44.000 | is the fact that we need to

01:19:46.000 | teach machines how to

01:19:48.000 | recognize models. And I'm

01:19:50.000 | curious as to what you think

01:19:52.000 | sort of non-

01:19:54.000 | goal-oriented

01:19:56.000 | cognitive activity comes into play

01:19:58.000 | there. Things like feelings and emotions

01:20:00.000 | and why

01:20:02.000 | you don't think that

01:20:04.000 | might not necessarily be like

01:20:06.000 | the most

01:20:08.000 | important question.

01:20:10.000 | - The only reason emotions didn't appear

01:20:12.000 | on my slide is because

01:20:14.000 | there's a few reasons, but the slide is only so big.

01:20:16.000 | I wanted the font to be big and readable for

01:20:18.000 | such an important slide. I have

01:20:20.000 | versions of my slide in which I do talk about that.

01:20:22.000 | Okay.

01:20:24.000 | It's not that I think feelings

01:20:26.000 | or emotions aren't important. I think they are important

01:20:28.000 | and I used to not have many insights

01:20:30.000 | about what to do about them,

01:20:32.000 | but actually partly based on some of my colleagues

01:20:34.000 | here at MIT,

01:20:36.000 | BCS, Laura Schultz and Rebecca Sachs,

01:20:38.000 | two of my cognitive colleagues who I

01:20:40.000 | work closely with, they've been

01:20:42.000 | starting to do research

01:20:44.000 | on how people understand emotions, both

01:20:46.000 | their own and others, and we've been starting to work

01:20:48.000 | with them on computational models. So that's actually something

01:20:50.000 | I'm actively interested in and even working on.

01:20:52.000 | But I would say, and again for those of you

01:20:54.000 | who study emotion or know about this, actually you're going to have

01:20:56.000 | Lisa coming in, right?

01:20:58.000 | She's going to basically say a version of the same thing,

01:21:00.000 | I think. The deepest way to understand,

01:21:02.000 | she's one of the world's experts on this,

01:21:04.000 | the deepest way to understand emotion is

01:21:06.000 | very much based on our mental models of

01:21:08.000 | ourselves, of the situation we're in, and of other people.

01:21:10.000 | Think about, for example,

01:21:12.000 | all of the different

01:21:14.000 | I mean, if you think

01:21:16.000 | about, I mean again, Lisa will talk all about this,

01:21:18.000 | but if you think about emotion as just a very

01:21:20.000 | small set of what are sometimes called basic emotions,

01:21:22.000 | like being happy or angry

01:21:24.000 | or sad or

01:21:26.000 | those are a

01:21:28.000 | small number of them, right? There's usually only

01:21:30.000 | a few, right?

01:21:32.000 | You might not say, you might see that

01:21:34.000 | as somehow like very basic things

01:21:36.000 | that are opposed to some kind of cognitive activity.

01:21:38.000 | But think about all the different words

01:21:40.000 | we have for emotion, right?

01:21:42.000 | For example, think about

01:21:44.000 | a famous cognitive emotion like

01:21:46.000 | regret. What does it mean to feel regret

01:21:48.000 | or frustration, right?

01:21:50.000 | To know both for yourself

01:21:52.000 | when you're not just feeling

01:21:54.000 | kind of down or negative, but you're feeling

01:21:56.000 | regret, that means something like

01:21:58.000 | I have to feel like there's a situation

01:22:00.000 | that came out differently from how I

01:22:02.000 | hoped, and I realized I could have

01:22:04.000 | done something differently, right?

01:22:06.000 | So that means you have to be able to understand,

01:22:08.000 | you have to have a model, you have to be able to

01:22:10.000 | do a kind of counterfactual reasoning and to think

01:22:12.000 | oh, if only I had acted a different way, then I

01:22:14.000 | can predict that the world would have come out differently, and that's the

01:22:16.000 | situation I wanted, but instead it came out this other way,

01:22:18.000 | right? Or think about frustration

01:22:20.000 | again, that requires something like

01:22:22.000 | understanding, okay, I've tried a bunch of times, I thought

01:22:24.000 | this would work, but it doesn't seem to be working,

01:22:26.000 | maybe I'm ready to give up.

01:22:28.000 | Those are all, those are very

01:22:30.000 | important human emotions.

01:22:32.000 | We have to understand, to understand ourselves,

01:22:34.000 | we need that, to understand other people, to understand

01:22:36.000 | communication, but those are all filtered

01:22:38.000 | through the kinds of models of action

01:22:40.000 | that I was, just the ones I was talking about

01:22:42.000 | here with these, say, cost-benefit analyses of action.

01:22:44.000 | So what I'm, so I'm just trying to say I think

01:22:46.000 | this is very basic stuff, but

01:22:48.000 | that will be the basis for building

01:22:50.000 | I think better engineering style

01:22:52.000 | models of the full spectrum of human emotion

01:22:54.000 | beyond just like, well, I'm feeling good or bad

01:22:56.000 | or scared, okay? And if, I think

01:22:58.000 | when you see Lisa, she will, in her own way,

01:23:00.000 | say something very similar.

01:23:02.000 | Interesting. Thanks.

01:23:04.000 | Yeah.

01:23:06.000 | Thanks, Josh, for your nice talk. So all is about

01:23:08.000 | human cognition and try to build a model

01:23:10.000 | to mimic those cognition, but you

01:23:12.000 | don't, how much could help you to understand

01:23:14.000 | how the circuit implement those things?

01:23:16.000 | You mean like the circuits in the brain?

01:23:18.000 | Yeah.

01:23:20.000 | Is that what you work on by any chance?

01:23:22.000 | Sorry, what? Is that what you work on by any chance?

01:23:24.000 | Yeah. Yeah, I know. I'm kidding. Yeah.

01:23:26.000 | So in the Center for Brains, Minds, and Machines,

01:23:28.000 | as well as in Brain and Cognitive Science,

01:23:30.000 | yeah, I have a number of colleagues

01:23:32.000 | who study the actual hardware basis

01:23:34.000 | of this stuff in the brain, and that

01:23:36.000 | includes like the large-scale architecture

01:23:38.000 | of the brain, say like what Nancy Kamisher,

01:23:40.000 | Rebecca Sachs study with functional brain imaging,

01:23:42.000 | or the more detailed circuitry, which usually

01:23:44.000 | requires recording from, say, non-human brains,

01:23:46.000 | right, at the level of individual neurons

01:23:48.000 | and connections between neurons. All right.

01:23:50.000 | So I'm very interested in those things,

01:23:52.000 | although it's not mostly what I work on, right?

01:23:54.000 | But I would say, you know, again, like in

01:23:56.000 | many other areas of science, certainly in neuroscience,

01:23:58.000 | the kind of work I'm talking about here

01:24:00.000 | in a sort of classic reductionist program

01:24:02.000 | sets the target for what we might look for.

01:24:04.000 | Like if I just want to go, I mean,

01:24:06.000 | I would, what I would assert, right,

01:24:08.000 | or my working conjunction,

01:24:10.000 | is that I would say, "Okay, so I'm going to

01:24:12.000 | do this, right, or my working conjecture

01:24:14.000 | is that if you do the kind of work

01:24:16.000 | that I'm talking about here,

01:24:18.000 | it gives you the right targets,

01:24:20.000 | or it gives you a candidate set of targets

01:24:22.000 | to look for, what are the neural circuits

01:24:24.000 | computing, right? Whereas if you just

01:24:26.000 | go in and just say, start poking

01:24:28.000 | around in the brain, or have some

01:24:30.000 | idea that what you're going to try to do

01:24:32.000 | is find the neural circuits which underlie

01:24:34.000 | behavior, without a sense of the computations

01:24:36.000 | needed to produce those behaviors,

01:24:38.000 | I think it's going to be very difficult

01:24:40.000 | to know what to look for,

01:24:42.000 | and to know when you've found

01:24:44.000 | even viable answers.

01:24:46.000 | So I think that's the standard kind of

01:24:48.000 | reductionist program,

01:24:50.000 | but it's not,

01:24:52.000 | I also think it's not

01:24:54.000 | one that is divorced from the study

01:24:56.000 | of neural circuits. It's also one,

01:24:58.000 | if you look at the broad picture of reverse

01:25:00.000 | engineering, it's one where

01:25:02.000 | neural circuits and understanding

01:25:04.000 | the circuits in the brain play

01:25:06.000 | an absolutely critical role, okay?

01:25:08.000 | I would say,

01:25:10.000 | when you look at the brain at the hardware level

01:25:12.000 | as an engineer, I'm mostly looking at

01:25:14.000 | the software level, right? But when you look at the hardware level,

01:25:16.000 | there are some remarkable properties.

01:25:18.000 | One remarkable property again is

01:25:20.000 | how much parallelism there is,

01:25:22.000 | and in many ways how fast the computations

01:25:24.000 | are, okay? Neurons are slow,

01:25:26.000 | but the computations of intelligence are very fast.

01:25:28.000 | So how do we get elements that are

01:25:30.000 | in some sense quite slow in their time

01:25:32.000 | constant to produce such intelligent

01:25:34.000 | behavior so quickly? That's a great mystery, and I think

01:25:36.000 | if we understood that, it would have payoff

01:25:38.000 | for building all sorts of

01:25:40.000 | basically application-embedded

01:25:42.000 | circuits, okay? But also

01:25:44.000 | maybe most important is the power consumption,

01:25:46.000 | and again, many people have

01:25:48.000 | noted this, right? If you look at

01:25:50.000 | the power consumption, the power that the brain

01:25:52.000 | consumes, like what did I eat today, okay?

01:25:54.000 | Almost nothing.

01:25:56.000 | My daughter, who's

01:25:58.000 | again, she's doing an internship here, she literally

01:26:00.000 | yesterday, all she ate was a burrito,

01:26:02.000 | and yet she wrote 300 lines of code

01:26:04.000 | for her internship project on

01:26:06.000 | a really cool computational linguistics

01:26:08.000 | project. So somehow she turned a burrito into

01:26:10.000 | a model of child language

01:26:12.000 | acquisition, okay? But how did she

01:26:14.000 | do that, or how do any of us do this, right?

01:26:16.000 | Where if you look at the power that we consume

01:26:18.000 | when we simulate even a very, very small

01:26:20.000 | chunk of cortex on our conventional

01:26:22.000 | hardware, or we do any kind of machine

01:26:24.000 | learning thing, we have systems which are

01:26:26.000 | very, very, very, very far

01:26:28.000 | from the power of the human brain computationally,

01:26:30.000 | but in terms of physical

01:26:32.000 | energy consumed,

01:26:34.000 | way past what any individual

01:26:36.000 | brain is doing. So how do we get circuitry

01:26:38.000 | of any sort, biological or

01:26:40.000 | just any physical circuits, to be as smart

01:26:42.000 | as we are with as little

01:26:44.000 | energy as we are? This is

01:26:46.000 | a huge problem for basically every

01:26:48.000 | area of engineering, right? If you want

01:26:50.000 | to have any kind

01:26:52.000 | of robot, the power consumption is a key

01:26:54.000 | bottleneck. Same for self-driving cars.

01:26:56.000 | If we want to build AI without

01:26:58.000 | contributing to global warming

01:27:00.000 | and climate change, let alone

01:27:02.000 | use AI to solve climate change, we really

01:27:04.000 | need to address these issues, and the brain

01:27:06.000 | is a huge guide there.

01:27:08.000 | I think there are some people who are

01:27:10.000 | really starting to think about this. How can we, say, for example,

01:27:12.000 | build somehow brain-inspired

01:27:14.000 | computers which are

01:27:16.000 | very, very low power, but maybe only approximate?

01:27:18.000 | So I'm thinking here of Joe Bates.

01:27:20.000 | I don't know if any of you know Joe. He's

01:27:22.000 | been around MIT and other places

01:27:24.000 | for quite a while. Can I tell them about

01:27:26.000 | your company? So Joe has

01:27:28.000 | a startup in Kendall Square

01:27:30.000 | called Singular Computing, and they have some

01:27:32.000 | very interesting ideas, including some actual

01:27:34.000 | implemented technology

01:27:36.000 | for low-power, approximate computing

01:27:38.000 | in a sort of a brain-like way that might

01:27:40.000 | lead to possibly even, like, the ability

01:27:42.000 | to build something -- this is Joe's dream --

01:27:44.000 | to build something that's about the size of this table

01:27:46.000 | but that has a billion cores, a billion cores,

01:27:48.000 | and runs on a reasonable

01:27:50.000 | kind of power consumption. I would love to have

01:27:52.000 | such a machine. If anybody

01:27:54.000 | wants to help Joe build it, I think he'd love to talk

01:27:56.000 | to you. But it's

01:27:58.000 | one of a number of ideas. I mean,

01:28:00.000 | Google X, people are working on similar things.

01:28:02.000 | Probably most of the major chip companies

01:28:04.000 | are also inspired by this idea.

01:28:06.000 | And I think, even if you didn't

01:28:08.000 | think you were interested in the brain, if you want to build

01:28:10.000 | the kind of AI we're talking about

01:28:12.000 | and run it on physical hardware of any sort,

01:28:14.000 | and understanding how the brain's circuits

01:28:16.000 | compute what they do,

01:28:18.000 | what I'm talking about, with

01:28:20.000 | as little power as they do,

01:28:22.000 | I don't know any better place to look.

01:28:24.000 | It seems like a lot of

01:28:26.000 | the improvements in AI have been driven

01:28:28.000 | by increasing computational power.

01:28:30.000 | How far would you say --

01:28:32.000 | You mean like GPUs or CPUs?

01:28:34.000 | How far would you say we are

01:28:36.000 | from hardware that could run a general

01:28:38.000 | artificial intelligence?

01:28:40.000 | Of the kind that I'm talking about?

01:28:42.000 | Yeah, I don't know. I'll start with a billion cores

01:28:44.000 | and then we'll see.

01:28:46.000 | I mean, I think we're --

01:28:48.000 | I think there's no way to answer that

01:28:50.000 | question in a way that's software independent.

01:28:52.000 | I don't know how to do that.

01:28:54.000 | But I think that

01:28:56.000 | it's --

01:28:58.000 | I don't know.

01:29:00.000 | When you say how far are we, you mean how far

01:29:02.000 | am I with the resources I have right now?

01:29:04.000 | How far am I if Google decides

01:29:06.000 | to put all of its resources at my disposal

01:29:08.000 | like they might if I were working at DeepMind?

01:29:10.000 | I don't know

01:29:12.000 | the answer to that question.

01:29:14.000 | I think what we can say is

01:29:16.000 | this. Individual neurons --

01:29:18.000 | I mean, again, this goes back to another reason

01:29:20.000 | to study neural circuits.

01:29:22.000 | If you look at what we currently call neural networks

01:29:24.000 | in the AI side, the model of a neuron

01:29:26.000 | is this very, very simple thing.

01:29:28.000 | Individual neurons are not only

01:29:30.000 | much more complex, but have a lot more

01:29:32.000 | computational power. It's not clear how they

01:29:34.000 | use it or whether they use it,

01:29:36.000 | but I think it's just as likely that a neuron

01:29:38.000 | is something like a relu

01:29:40.000 | is that a neuron is something like a computer.

01:29:42.000 | Like, one neuron in your brain is

01:29:44.000 | more like a CPU node,

01:29:46.000 | maybe. And thus,

01:29:48.000 | the 10 billion or trillion --

01:29:50.000 | the large number of neurons

01:29:52.000 | in your brain --

01:29:54.000 | I think it's like 10 billion cortical pyramidal neurons

01:29:56.000 | or something -- might be like

01:29:58.000 | 10 billion cores.

01:30:00.000 | That's at least as plausible, I think, to me

01:30:02.000 | as any other estimate.

01:30:04.000 | I think we're definitely

01:30:06.000 | on the underside with very big error bars.

01:30:08.000 | I completely agree that --

01:30:10.000 | or if this is what you might be suggesting,

01:30:12.000 | and going back to my answer

01:30:14.000 | to your question, I don't think we're going to get to what

01:30:16.000 | I'm talking about or anything like a real brain

01:30:18.000 | scale without major

01:30:20.000 | innovations on the hardware side.

01:30:22.000 | It's interesting that what drove those

01:30:24.000 | innovations that support

01:30:26.000 | current AI was mostly not AI.

01:30:28.000 | It was the video game industry.

01:30:30.000 | When I point to

01:30:32.000 | the video game engine in your head, that's a similar

01:30:34.000 | thing that was driven by the video game industry

01:30:36.000 | on the software side. I think we

01:30:38.000 | should all play as many video games as we can

01:30:40.000 | and contribute to the growth of

01:30:42.000 | the video game industry.

01:30:44.000 | You can see this

01:30:46.000 | in very -- there are companies out there

01:30:48.000 | for example, there's a company called Improbable,

01:30:50.000 | which is a London company,

01:30:52.000 | a London-based startup, a pretty

01:30:54.000 | sizable startup at this point, which is building

01:30:56.000 | something that they call Spatial OS,

01:30:58.000 | which is -- it's not a

01:31:00.000 | hardware idea, but it's a kind of software

01:31:02.000 | idea for very, very big distributed computing

01:31:04.000 | environments to run

01:31:06.000 | much, much more complex, realistic

01:31:08.000 | simulations of the world for much more interesting, immersive,

01:31:10.000 | permanent video games. I think that's

01:31:12.000 | one thing that might -- hopefully that will lead to more

01:31:14.000 | fun, new kinds of games.

01:31:16.000 | But that's one example of where

01:31:18.000 | we might look to that industry to drive

01:31:20.000 | some of the -- just computer systems,

01:31:22.000 | really hardware and software

01:31:24.000 | systems that will

01:31:26.000 | take our game to the next level.

01:31:28.000 | Josh, understanding

01:31:30.000 | on the algorithmic level

01:31:32.000 | or cognitive level is just to

01:31:34.000 | understanding the learning, the meaning of

01:31:36.000 | learning would be how to predict.

01:31:38.000 | But on the circuit level it's different.

01:31:40.000 | But at the what level? On the circuit level.

01:31:42.000 | Well, of course it's

01:31:44.000 | different, right? But already

01:31:46.000 | I think you made a mistake there, honestly.

01:31:48.000 | Like, you said the cognitive level is learning how to

01:31:50.000 | predict, but I'm not sure what you mean by that.

01:31:52.000 | There's many things you could mean, and what our cognitive

01:31:54.000 | science is about is learning which of those versions --

01:31:56.000 | like, I don't think it's learning how to predict. I think

01:31:58.000 | it's learning what you need to know to plan

01:32:00.000 | actions and to -- you know, all those things.

01:32:02.000 | Like, it's not just about predicting.

01:32:04.000 | Because there are things we can imagine that

01:32:06.000 | you would never predict because they would never

01:32:08.000 | happen unless we somehow make the world

01:32:10.000 | different. So generalization, sorry, not

01:32:12.000 | predicting. When your model could generalize.

01:32:14.000 | But especially in the transfer learning that

01:32:16.000 | you are interested in, a few hundred neurons in

01:32:18.000 | prefrontal cortex, they could generalize a lot.

01:32:20.000 | But not

01:32:22.000 | kind of a Bayesian model could do that.

01:32:24.000 | You said,

01:32:26.000 | but a Bayesian model won't do that?

01:32:28.000 | Or they don't do it the way a Bayesian model

01:32:30.000 | does? For sure, because that's in the abstract level.

01:32:32.000 | Well, I mean, how do you

01:32:34.000 | really know?

01:32:36.000 | I mean, what does it mean to say that some neurons

01:32:38.000 | do it? So maybe another way

01:32:40.000 | to put this is to say, look, we have a certain math

01:32:42.000 | that we use to capture these -- you

01:32:44.000 | could call it abstract, or I call it software

01:32:46.000 | level abstractions, right? I mean,

01:32:48.000 | all engineering is based on some kind of abstraction.

01:32:50.000 | But you might have a circuit level abstraction,

01:32:52.000 | a certain kind of hardware level that you're

01:32:54.000 | interested in describing the brain at.

01:32:56.000 | And I'm mostly working at or starting from a more

01:32:58.000 | software level of abstraction. They're all abstractions.

01:33:00.000 | We're not talking about molecules here.

01:33:02.000 | We're talking about some abstract notion

01:33:04.000 | of maybe a circuit, or of a program.

01:33:06.000 | Now it's a really interesting

01:33:08.000 | question. If I look at some circuits,

01:33:10.000 | how do I know what program they're implementing?

01:33:12.000 | If I look at the circuits in this machine,

01:33:14.000 | could I tell what program they're implementing?

01:33:16.000 | Well, maybe, but certainly it would be a lot easier

01:33:18.000 | if I knew something about what programs they might be implementing

01:33:20.000 | before I start to look at the circuitry. If I just

01:33:22.000 | looked at the circuitry without

01:33:24.000 | knowing what a program was, or what programs

01:33:26.000 | the thing might be doing, or what kind of

01:33:28.000 | programming components would be

01:33:30.000 | mappable to circuits in different ways,

01:33:32.000 | I don't even know how I'd begin

01:33:34.000 | to answer that question. So I think

01:33:36.000 | we've made some progress at understanding

01:33:38.000 | what neurons are doing in certain

01:33:40.000 | low-level parts of sensory system

01:33:42.000 | and certain parts of the motor system, like

01:33:44.000 | primary motor cortex. Basically, the parts

01:33:46.000 | of the neurons that are closest to the inputs and outputs

01:33:48.000 | of the brain, where

01:33:50.000 | we don't--you could say

01:33:52.000 | we don't need the kind of software

01:33:54.000 | abstractions that I'm talking about,

01:33:56.000 | or where we sort of agree on

01:33:58.000 | what those things already are, so we can make

01:34:00.000 | enough progress on knowing what to look for

01:34:02.000 | and how to know when we've found it.

01:34:04.000 | But if you want to talk about flexible planning,

01:34:06.000 | things that are more like cognition, that

01:34:08.000 | go on in prefrontal cortex,

01:34:10.000 | at this point, I don't

01:34:12.000 | think that just by recording from those

01:34:14.000 | neurons, we're going to be able to answer those questions

01:34:16.000 | in a meaningful engineering way.

01:34:18.000 | A way that any engineer, software,

01:34:20.000 | hardware, whatever, could really say,

01:34:22.000 | "Yeah, okay, I get it. I get those insights in a way

01:34:24.000 | that I can engineer with." And that's what my goal

01:34:26.000 | is. So that's my goal to do

01:34:28.000 | at the software level, the hardware level, or the entire

01:34:30.000 | systems level, connecting them.

01:34:32.000 | And I think that we can do that

01:34:34.000 | by taking what we're doing and bringing it to contact

01:34:36.000 | with people studying neural circuits. But I don't

01:34:38.000 | think you can leave this level out

01:34:40.000 | and just go straight to the neural circuits. And I think

01:34:42.000 | the more progress we make,

01:34:44.000 | the more we can help people who are studying

01:34:46.000 | at the neural circuit level. And they can help us

01:34:48.000 | address these other engineering questions that we don't

01:34:50.000 | really have access to, like the power issue

01:34:52.000 | or the speed issue.

01:34:54.000 | Thanks. That was great. So maybe

01:34:56.000 | we give Jack a big hand.

01:34:58.000 | [Applause]

01:35:00.000 | [Applause]

01:35:02.000 | [Applause]

01:35:04.000 | [Applause]

01:35:06.000 | [APPLAUSE]

MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)

Chapters