back to index

MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)


Chapters

0:0
10:14 A long-term research roadmap
17:15 CBMM Architecture of Visual Intelligence
40:2 The Roots of Common Sense
52:35 The intuitive physics engine
55:19 Prediction by simulation

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we have Josh Tenenbaum.
00:00:02.000 | He's a professor here at MIT,
00:00:04.320 | leading the computational cognitive science group.
00:00:07.040 | Among many other topics in cognition and intelligence,
00:00:11.000 | he is fascinated with the question of how human beings learn
00:00:14.280 | so much from so little.
00:00:16.280 | And how these insights can lead to build AI systems
00:00:19.680 | that are much more efficient learning from data.
00:00:22.000 | So please give Josh a warm welcome.
00:00:25.000 | [Applause]
00:00:31.000 | All right. Thank you very much.
00:00:33.000 | Thanks for having me.
00:00:35.000 | I'm excited to be part of what looks like
00:00:37.000 | really quite a very impressive lineup,
00:00:39.000 | especially starting after today.
00:00:41.000 | And it's, I think, quite a great opportunity
00:00:44.000 | to get to see perspectives on artificial intelligence
00:00:47.000 | from many of the leaders in industry
00:00:49.000 | and other entities working on this great quest.
00:00:53.000 | So I'm going to talk to you about some of the work
00:00:55.000 | that we do in our group, but also I'm going to try to give
00:00:58.000 | a broader perspective reflective of a number of MIT faculty,
00:01:01.000 | especially those who are affiliated with
00:01:03.000 | the Center for Brains, Minds, and Machines.
00:01:05.000 | So you can see up there on my affiliation.
00:01:07.000 | Academically, I'm part of Brain and Cognitive Science,
00:01:09.000 | or Course 9. I'm also part of CSAIL.
00:01:12.000 | But I'm also part of the Center for Brains, Minds, and Machines,
00:01:15.000 | which is an NSF-funded center, science and technology center,
00:01:18.000 | which really stands for the bridge between
00:01:20.000 | the science and the engineering of intelligence.
00:01:22.000 | It literally straddles Vassar Street
00:01:24.000 | in that we have CSAIL and BCS members.
00:01:26.000 | We also have partners at Harvard and other academic institutions.
00:01:29.000 | And again, what we stand for, I want to try to convey
00:01:31.000 | some of the specific things we're doing in the center
00:01:34.000 | and where we want to go with a vision that really is about
00:01:37.000 | jointly pursuing the science, the basic science
00:01:40.000 | of how intelligence arises in the human mind and brain,
00:01:43.000 | and also the engineering enterprise
00:01:46.000 | of how to build something increasingly like
00:01:48.000 | human intelligence in machines.
00:01:50.000 | And we deeply believe that these two projects
00:01:52.000 | have something to do with each other
00:01:54.000 | and are best pursued jointly.
00:01:56.000 | Now, it's a really exciting time to be doing anything
00:01:58.000 | related to intelligence or certainly to AI
00:02:00.000 | for all the reasons that, you know, brought you all here.
00:02:03.000 | I don't have to tell you this.
00:02:05.000 | We have all these ways in which AI is kind of finally here.
00:02:08.000 | We finally live in the era of something like real practical AI.
00:02:11.000 | Or for those who've been around for a while
00:02:14.000 | and have seen some of the rises and falls,
00:02:16.000 | you know, AI is back in a big way.
00:02:18.000 | But from my perspective, and I think maybe this reflects,
00:02:21.000 | you know, why we distinguish what we might call AGI from AI,
00:02:24.000 | we don't really have any real AI, basically.
00:02:27.000 | We have what I like to call AI technologies,
00:02:29.000 | which are systems that do things we used to think
00:02:32.000 | that only humans could do, and now we have machines
00:02:34.000 | that do them, often quite well, maybe even better
00:02:37.000 | than any human who's ever lived, right,
00:02:39.000 | like a machine that plays Go.
00:02:41.000 | But none of these systems, I would say,
00:02:43.000 | are truly intelligent.
00:02:45.000 | None of them have anything like common sense.
00:02:47.000 | They have nothing like the flexible, general-purpose
00:02:49.000 | intelligence that each of you might use
00:02:51.000 | to learn every one of these skills or tasks, right?
00:02:54.000 | Each of these systems had to be built
00:02:56.000 | by large teams of engineers,
00:02:58.000 | working together often for a number of years,
00:03:00.000 | often at great cost to somebody who's willing to pay for it.
00:03:03.000 | And each of them just does one thing.
00:03:05.000 | So AlphaGo might beat the world's best,
00:03:07.000 | but it can't drive to the match
00:03:09.000 | or even tell you what Go is.
00:03:12.000 | It can't even tell you that Go is a game
00:03:14.000 | because it doesn't even know what a game is, right?
00:03:16.000 | So what's missing?
00:03:18.000 | What is it that makes every one of your brains--
00:03:20.000 | maybe you can't beat the world's best in Go,
00:03:23.000 | but any one of you can get behind the wheel of a car.
00:03:26.000 | I think of this because my daughter
00:03:28.000 | is going to turn 16 tomorrow.
00:03:30.000 | If she lived in California, she'd have a driver's license.
00:03:33.000 | It's a little bit down the line for us here in Massachusetts.
00:03:36.000 | But she didn't have to be specially engineered
00:03:39.000 | by billion-dollar startups,
00:03:41.000 | and she got really into chess recently,
00:03:43.000 | and now she's taught herself chess
00:03:45.000 | by playing just a handful of games, basically.
00:03:48.000 | And she can do any one of these activities,
00:03:50.000 | and any one of us can.
00:03:52.000 | So what is it? What makes up the difference?
00:03:54.000 | Well, there's many things, right?
00:03:56.000 | I'll talk about the focus for us in our research,
00:04:00.000 | and a lot of us, again, in CBMM, is summarized here.
00:04:04.000 | What drives the successes right now in AI,
00:04:08.000 | especially in industry, okay,
00:04:10.000 | in all these AI technologies, is many, many things, many things.
00:04:13.000 | But where the progress has been made most recently
00:04:16.000 | and what's getting most of the attention
00:04:18.000 | is, of course, deep learning,
00:04:20.000 | but other kinds of machine-learning technologies
00:04:22.000 | which essentially represent the maturation
00:04:24.000 | of a decades-long effort
00:04:26.000 | to solve the problem of pattern recognition.
00:04:28.000 | That means taking data and finding patterns in the data
00:04:31.000 | that tell you something you care about,
00:04:33.000 | like how to label a class
00:04:35.000 | or how to predict some other signal, okay?
00:04:37.000 | And pattern recognition is great.
00:04:39.000 | It's an important part of intelligence,
00:04:41.000 | and it's reasonable to say that deep learning as a technology
00:04:45.000 | has really made great strides on pattern recognition
00:04:48.000 | and maybe even, you know, has come in close
00:04:50.000 | to solving the problems of pattern recognition.
00:04:53.000 | But intelligence is about many other things.
00:04:55.000 | Intelligence is about a lot more.
00:04:57.000 | In particular, it's about modeling the world.
00:05:00.000 | And think about all the activities that a human does
00:05:03.000 | to model the world that go beyond just, say,
00:05:05.000 | recognizing patterns in data,
00:05:07.000 | but actually trying to explain and understand
00:05:09.000 | what we see, for instance, okay?
00:05:11.000 | Or to be able to imagine things that we've never seen,
00:05:14.000 | that never seen, maybe even very different
00:05:16.000 | from anything we've ever seen, but might want to see,
00:05:19.000 | and then to set those as goals,
00:05:21.000 | to make plans and solve problems
00:05:23.000 | needed to make those things real.
00:05:25.000 | Or thinking about learning, again, you know,
00:05:27.000 | some kinds of learning can be thought of as pattern recognition
00:05:30.000 | if you're learning sufficient statistics
00:05:32.000 | or weights in a neural net that are used for those purposes.
00:05:35.000 | But many activities of learning are about building out new models,
00:05:38.000 | right, either refining, reusing, improving old models,
00:05:41.000 | or actually building fundamentally new models
00:05:43.000 | as you experience more of the world.
00:05:45.000 | And then think about sharing our models,
00:05:47.000 | communicating our models to others,
00:05:49.000 | modeling their models, learning from them.
00:05:51.000 | All these activities of modeling,
00:05:53.000 | these are at the heart of human intelligence,
00:05:55.000 | and it requires a much broader set of tools.
00:05:58.000 | So I want to talk about the ways we're studying
00:06:00.000 | these activities of modeling the world,
00:06:02.000 | and something in a pretty non-technical way
00:06:04.000 | about what are the kind of tools
00:06:06.000 | that we need to capture these abilities.
00:06:08.000 | Now, I think it's-- I want to be very honest up front
00:06:11.000 | and to say this is just the beginning of a story, right?
00:06:14.000 | When you look at deep learning successes,
00:06:16.000 | that itself is a story that goes back decades.
00:06:18.000 | I'll say a little bit about that history in a minute.
00:06:20.000 | But where we are now is just looking forward to a future
00:06:23.000 | when we might be able to capture these abilities,
00:06:25.000 | you know, at a really mature engineering scale.
00:06:27.000 | And I would say we are far from being able to capture
00:06:30.000 | all the ways in which humans richly, flexibly, quickly
00:06:33.000 | build models of the world at the kind of scale
00:06:35.000 | that, say, Silicon Valley wants,
00:06:37.000 | either big tech companies like Google or Microsoft
00:06:39.000 | or IBM or Facebook or small startups, right?
00:06:42.000 | We can get there.
00:06:44.000 | And I think what I want to talk to you about here
00:06:47.000 | is one route for trying to get there,
00:06:49.000 | and this is the route that CBMM stands for,
00:06:51.000 | the idea that by reverse engineering
00:06:53.000 | how intelligence works in the human mind and brain,
00:06:55.000 | that will give us a route
00:06:57.000 | to engineering these abilities in machines.
00:06:59.000 | When we say reverse engineering, we're talking about science,
00:07:02.000 | but doing science like engineers.
00:07:04.000 | That's our fundamental principle,
00:07:06.000 | that if we approach cognitive science and neuroscience
00:07:08.000 | like an engineer, where, say, the output of our science
00:07:10.000 | isn't just a description of the brain or the mind in words,
00:07:13.000 | but in the same terms that an engineer would use
00:07:15.000 | to build an intelligent system,
00:07:17.000 | then that will be both the basis for a much more rigorous
00:07:19.000 | and deeply insightful science,
00:07:21.000 | but also direct translation of those insights
00:07:23.000 | into engineering applications.
00:07:25.000 | Now, I said before I talk a little about history,
00:07:28.000 | what I mean by that is this.
00:07:30.000 | Again, if part of what brought you here is deep learning,
00:07:33.000 | and I know even if you've never heard of deep learning before,
00:07:35.000 | which I'm sure is unlikely,
00:07:37.000 | you saw a good spectrum of that
00:07:39.000 | in the overview session last night.
00:07:42.000 | It's really interesting and important
00:07:45.000 | to look back on the history of where did techniques
00:07:47.000 | for deep learning come from, or reinforcement learning.
00:07:50.000 | Those are the two tools in the current machine learning arsenal
00:07:53.000 | that are getting the most attention,
00:07:55.000 | things like back propagation or end-to-end stochastic gradient descent
00:07:58.000 | or temporal difference learning or Q-learning.
00:08:00.000 | Here's a few papers from the literature.
00:08:02.000 | Maybe some of you have read these original papers.
00:08:04.000 | Here's the original paper by Rumelhart, Hinton, and colleagues
00:08:07.000 | in which they introduced the back propagation algorithm
00:08:09.000 | for training multilayer perceptrons,
00:08:11.000 | multilayer neural networks.
00:08:13.000 | Here's the original perceptron paper by Rosenblatt,
00:08:15.000 | which introduced the one-layer version of that architecture
00:08:17.000 | and the basic perceptron learning algorithm.
00:08:20.000 | Here's the first paper on the temporal difference learning method
00:08:24.000 | for reinforcement learning from Sutton and Bartow.
00:08:27.000 | Here's the original Boltzmann machine paper,
00:08:29.000 | also by Hinton and colleagues,
00:08:31.000 | which, again, those of you who don't know that architecture,
00:08:34.000 | think of a kind of probabilistic, undirected, multilayer perceptron.
00:08:39.000 | Or, for example, before there were LSTMs,
00:08:42.000 | if you know about current recurrent neural network architecture,
00:08:45.000 | earlier, much simpler versions of the same idea
00:08:47.000 | were proposed by Jeff Ellman and his simple recurrent networks.
00:08:50.000 | The reason I want to put up the original papers here
00:08:52.000 | is for you to look at both when they were published
00:08:55.000 | and where they were published.
00:08:57.000 | So if you look at the dates, you'll see papers going back
00:08:59.000 | to the '80s, but even the '60s, or even the 1950s.
00:09:03.000 | And look at where they were published.
00:09:05.000 | Most of them were published in psychology journals.
00:09:08.000 | So the journal Psychological Review, if you don't know it,
00:09:10.000 | is like the leading journal of theoretical psychology
00:09:12.000 | and mathematical psychology.
00:09:14.000 | Or Cognitive Science, the journal of the Cognitive Science Society.
00:09:17.000 | Or the backprop paper was published in Nature,
00:09:20.000 | which is a general interest science journal,
00:09:22.000 | but by people who are mostly affiliated with
00:09:24.000 | the Institute for Cognitive Science in San Diego.
00:09:26.000 | So what you see here is already a long history
00:09:29.000 | of scientists thinking like engineers.
00:09:31.000 | These are people who are in psychology or cognitive science departments
00:09:34.000 | and publishing in those places, but by formalizing
00:09:37.000 | even very basic insights about how humans might learn,
00:09:40.000 | or how brains might learn, in the right kind of math,
00:09:44.000 | that led to, of course, progress on the science side,
00:09:47.000 | but it led to all the engineering that we see now.
00:09:49.000 | It wasn't sufficient, right?
00:09:51.000 | We needed, of course, lots of innovations and advances
00:09:54.000 | in computing hardware and software systems, right?
00:09:57.000 | But this is where the basic math came from,
00:10:00.000 | and it came from doing science like an engineer.
00:10:03.000 | So what I want to talk about in our vision is,
00:10:05.000 | what does the future of this look like?
00:10:07.000 | If we were to look 50 years into the future,
00:10:09.000 | what would we be looking back on now, or over this time scale?
00:10:12.000 | Well, here's a long-term research roadmap that reflects
00:10:15.000 | some of my ambitions and some of our center's goals,
00:10:18.000 | and many others, too.
00:10:20.000 | We'd like to be able to address basic questions,
00:10:22.000 | questions of what it is to be and to think like a human,
00:10:25.000 | questions, for example, of consciousness,
00:10:27.000 | or meaning and language, or real learning, right?
00:10:30.000 | Questions like, even beyond the individual,
00:10:33.000 | like questions of culture, or creativity.
00:10:36.000 | Those are our big ideas up there.
00:10:38.000 | And for each of these, there are basic scientific questions.
00:10:40.000 | How do we become aware of the world and ourselves in it?
00:10:43.000 | It starts with perception, but it really turns into awareness,
00:10:46.000 | awareness of yourself and of the world,
00:10:48.000 | and what we might call consciousness, right?
00:10:50.000 | How does a word start to have a meaning?
00:10:52.000 | What really is a meaning, and how does a child grasp it?
00:10:54.000 | Or how do children actually learn?
00:10:56.000 | What do babies' brains actually start with?
00:10:58.000 | Are they blank slates, or do they start with some kind of
00:11:00.000 | cognitive structure?
00:11:01.000 | And then what does real learning look like?
00:11:03.000 | These are just some of the questions that we're interested
00:11:05.000 | in working on.
00:11:06.000 | Or when we talk about culture, we mean,
00:11:08.000 | how do you learn all the things you didn't directly experience,
00:11:11.000 | right, but that somehow you got from the accumulation of
00:11:13.000 | knowledge in society over many generations?
00:11:16.000 | Or how do you ever think of new ideas,
00:11:18.000 | or answers to new questions?
00:11:19.000 | How do you think of the new questions themselves?
00:11:21.000 | How do you decide what to think about?
00:11:23.000 | These are all key activities of human intelligence.
00:11:25.000 | When we talk about how we model the world,
00:11:27.000 | where our models come from, what we do with our models,
00:11:29.000 | this is what we're talking about.
00:11:31.000 | And if we could get machines that could do these things,
00:11:33.000 | well, again, on the bottom row, think of all the actual
00:11:35.000 | real engineering payoffs.
00:11:36.000 | Now, in our center, in both my own activities and a lot of what
00:11:40.000 | my group does these days, and what a number of other
00:11:42.000 | colleagues in the Center for Brains, Minds, and Machines do,
00:11:45.000 | as well as, you know, very broadly people in BCS and CSAIL,
00:11:48.000 | one place where we work on the beginnings of these problems,
00:11:50.000 | in the near term, this is the long term,
00:11:52.000 | like think 50 years, okay, maybe shorter, maybe longer,
00:11:55.000 | I don't know, but think well beyond 10 years.
00:11:59.000 | But in the short term, 5 to 10 years,
00:12:01.000 | a lot of our focus is around visual intelligence,
00:12:03.000 | and there's many reasons for that.
00:12:05.000 | Again, we can build on the successes of deep networks,
00:12:07.000 | and a lot of pattern recognition and machine vision.
00:12:10.000 | It's a good way to put these ideas into practice.
00:12:12.000 | When we look at the actual brain,
00:12:14.000 | the visual system in the brain, in the human and other
00:12:17.000 | mammalian brains, for example, is really,
00:12:19.000 | very clearly the best understood part of the brain,
00:12:21.000 | and at a circuit level, it's the part of the brain
00:12:23.000 | that's most inspired current deep learning
00:12:26.000 | and neural network systems.
00:12:28.000 | But even there, there's things which we still don't
00:12:30.000 | really understand like engineers.
00:12:32.000 | So here's an example of a basic problem in visual intelligence
00:12:35.000 | that we and others in the Center are trying to solve.
00:12:39.000 | Look around you, and you feel like there's a whole world
00:12:43.000 | around you, and there is a whole world around you,
00:12:45.000 | and you feel like your brain captures it.
00:12:47.000 | But what the actual sense data that's coming in through
00:12:50.000 | your eyes looks more like this photograph here,
00:12:52.000 | where you can see there's a crowd scene,
00:12:54.000 | but it's mostly blurry except for a small region
00:12:56.000 | of high resolution in the center.
00:12:58.000 | So that corresponds biologically to what part of the image
00:13:00.000 | is in your fovea.
00:13:02.000 | That's the central region of cells in the retina,
00:13:04.000 | where you have really high resolution visual data.
00:13:07.000 | The size of your fovea is roughly like if you hold out
00:13:09.000 | your thumb at arm's length, it's a little bit bigger
00:13:11.000 | than that, but not much bigger.
00:13:13.000 | Most of the image, in terms of the actual information
00:13:16.000 | coming in in a bottom-up sense to your brain,
00:13:18.000 | is really quite blurry.
00:13:20.000 | But somehow by looking at just one part,
00:13:22.000 | and then by saccading around or making a few eye movements,
00:13:25.000 | you get a few glimpses, each not much bigger than the size
00:13:27.000 | of your thumb at arm's length.
00:13:29.000 | Somehow you stitch that information together
00:13:31.000 | into what feels like and really is a rich representation
00:13:34.000 | of the whole world around you.
00:13:36.000 | And when I say around you, I mean literally around you.
00:13:38.000 | So here's another kind of demonstration.
00:13:41.000 | Without turning around, nobody's allowed to turn around,
00:13:44.000 | ask yourself, what's behind you?
00:13:46.000 | Now the answer's going to be different for different people,
00:13:48.000 | depending on where you're sitting.
00:13:50.000 | For most of you, you might think, well,
00:13:52.000 | I think there's a person pretty close behind me.
00:13:55.000 | You know you're in a crowded auditorium,
00:13:56.000 | although you haven't seen that person.
00:13:58.000 | You know that they're there.
00:14:00.000 | For people in the very back row, you know there isn't
00:14:02.000 | a person behind you, and you're conscious of being
00:14:04.000 | in the back row.
00:14:05.000 | You might be conscious that there's a wall right behind you.
00:14:07.000 | But now for the people who are in the room,
00:14:10.000 | not in the very back, think about how far behind you
00:14:13.000 | is the back, like where's the nearest wall behind you?
00:14:15.000 | So we can, maybe we can call out,
00:14:17.000 | try a little demonstration.
00:14:18.000 | So I don't know, I'm pointing to someone there.
00:14:20.000 | Can you see, say something if you think I'm pointing at you.
00:14:24.000 | Well, I could have been pointing at you,
00:14:25.000 | but I'm pointing to someone behind you, okay.
00:14:27.000 | I'll point to you, yeah, I'm pointing to you.
00:14:29.000 | All right, so how far is the nearest wall?
00:14:31.000 | No, you can't turn around, you've blown your chance.
00:14:33.000 | (audience laughing)
00:14:34.000 | Without turning around, okay, so you were,
00:14:36.000 | okay, do you see I'm pointing to you there with the tie?
00:14:38.000 | Okay, so without turning around,
00:14:40.000 | how far is the nearest wall behind you?
00:14:42.000 | That's, sorry, how far?
00:14:48.000 | Five meters, okay, well, I mean,
00:14:50.000 | that might be about right.
00:14:51.000 | No, no, other people can turn around.
00:14:54.000 | Now, how about you, how far is the nearest wall behind you?
00:14:57.000 | 10 meters, okay, that might be right, yeah.
00:15:03.000 | How about here, what do you think?
00:15:06.000 | 20, okay, so yeah, since I didn't grow up
00:15:09.000 | in the metric system, I barely know, but yeah, I mean,
00:15:11.000 | I mean, the point is that like,
00:15:13.000 | you're, each of you is surely not exactly right,
00:15:17.000 | but you're certainly within an order of magnitude,
00:15:19.000 | and I guess if we actually tried to measure,
00:15:21.000 | you know, you're probably, my guess is you're probably right
00:15:23.000 | within, you know, 50% or less,
00:15:25.000 | often, you know, maybe just 20% error.
00:15:28.000 | Okay, so how do you know this?
00:15:29.000 | I mean, even if it's not, what did you say, 20 meters?
00:15:31.000 | Even if it's not 20 meters, it's probably closer to 20 meters
00:15:34.000 | than it is to five or 10 meters,
00:15:36.000 | and then it is to 50 meters, so how do you know this?
00:15:38.000 | You haven't turned around in a while, right?
00:15:40.000 | But some part of your brain is tracking
00:15:42.000 | the whole world around you, right?
00:15:44.000 | And how many people are behind you?
00:15:46.000 | Yeah, like a few hundred, right?
00:15:49.000 | I mean, I don't know if it's 200 or 300,
00:15:51.000 | but it's not 1,000, I don't think so,
00:15:54.000 | and it's certainly not 10 or 20 or 50, right?
00:15:57.000 | So you track these things,
00:15:59.000 | and you use them to plan your actions.
00:16:01.000 | Okay, so again, think about how instantly, effortlessly,
00:16:04.000 | and very reliably, okay,
00:16:06.000 | your brain computes all these things,
00:16:08.000 | so the people and objects around you,
00:16:10.000 | and it's not just, you know, approximations.
00:16:12.000 | Certainly, when we're talking about what's behind you in space,
00:16:15.000 | there's a lot of imprecision,
00:16:17.000 | but when it comes to reaching for things right in front of you,
00:16:20.000 | very precise shape and physical property estimates
00:16:22.000 | needed to pick up and manipulate objects,
00:16:24.000 | and then when it comes to people,
00:16:26.000 | it's not just the existence of the people,
00:16:28.000 | but something about what's in their head, right?
00:16:30.000 | You track whether someone's paying attention to you
00:16:32.000 | when you're talking to them, what they might want from you,
00:16:34.000 | what they might be thinking about you,
00:16:36.000 | what they might be thinking about other people, okay?
00:16:38.000 | So when we talk about visual intelligence,
00:16:40.000 | this is the whole stuff we're talking about,
00:16:42.000 | and you can start to see how it turns into basic questions,
00:16:45.000 | I think, of what we might call the beginnings of consciousness,
00:16:49.000 | or at least our awareness of ourself in the world,
00:16:52.000 | and of ourselves as a self in the world,
00:16:55.000 | but also other aspects of higher-level intelligence
00:16:57.000 | and cognition that are not just about perception,
00:16:59.000 | like symbols, right, to describe, even to ourselves,
00:17:02.000 | what's around us and where we are and what we can do with it.
00:17:05.000 | You have to go beyond just what we would normally call
00:17:08.000 | the stuff of perception to, say, the thoughts in somebody's head
00:17:11.000 | and your own thoughts about that, okay?
00:17:13.000 | So what we've been doing in CBMM
00:17:15.000 | is trying to develop an architecture
00:17:17.000 | for visual intelligence,
00:17:19.000 | and I'm not going to go into any of the details of how this works,
00:17:22.000 | and this is just notional, this is just a picture,
00:17:24.000 | it's like just a sketch from a grant proposal
00:17:26.000 | of what we say we want to do,
00:17:28.000 | but it's based on a lot of scientific understanding
00:17:30.000 | of how the brain works.
00:17:32.000 | There are different parts of the brain that correspond
00:17:34.000 | to these different modules in our architecture,
00:17:36.000 | as well as some kind of emerging engineering way
00:17:38.000 | to try to capture at the software
00:17:40.000 | and maybe even hardware levels how these modules might work.
00:17:43.000 | So we talk about sort of an early module
00:17:45.000 | of a visual or perceptual stream,
00:17:47.000 | which is like bottom-up visual or other perceptual input.
00:17:50.000 | That's the kind of thing that is pretty close
00:17:52.000 | to what we currently have in, say,
00:17:54.000 | deep convolutional neural networks.
00:17:56.000 | But then we talk about some kind of--
00:17:58.000 | the output of that isn't just pattern class labels,
00:18:00.000 | but what we call the cognitive core,
00:18:02.000 | core cognition.
00:18:04.000 | So again, an understanding of space and objects,
00:18:06.000 | their physics, other people, their minds,
00:18:08.000 | that's the real stuff of cognition
00:18:10.000 | that has to be the output of perception.
00:18:12.000 | But somehow we have to have--
00:18:14.000 | this is what we call the brain OS in this picture.
00:18:17.000 | We have to get there by stitching together
00:18:19.000 | the bottom-up inputs from a glimpse here,
00:18:21.000 | a glimpse here, a little bit here and there,
00:18:23.000 | and accessing prior knowledge
00:18:25.000 | from our memory systems to tell us
00:18:27.000 | how to stitch these things together
00:18:29.000 | into the really core cognitive representations
00:18:31.000 | of what's out there in the world.
00:18:33.000 | And then if we're going to start to talk about it in language
00:18:36.000 | or to build plans on top of what we have seen and understood,
00:18:40.000 | that's where we talk about symbols coming into the picture,
00:18:44.000 | the building blocks of language and plans and so on.
00:18:48.000 | So now we might say, well, okay, this is an architecture
00:18:52.000 | that is brain-inspired and cognitively inspired,
00:18:55.000 | and we're planning to turn into real engineering.
00:18:57.000 | And you can say, well, do we need that?
00:18:59.000 | Maybe--again, I know this is a question
00:19:01.000 | you considered in the first lecture.
00:19:03.000 | Maybe the engineering toolkit that's currently
00:19:05.000 | been making a lot of progress in, let's say, industry,
00:19:07.000 | maybe that's good enough.
00:19:09.000 | Maybe let's take deep learning, but to stand for a broader set
00:19:12.000 | of modern pattern-recognition-based
00:19:14.000 | and reinforcement-learning-based tools
00:19:16.000 | and say, okay, well, maybe that can scale up to this.
00:19:20.000 | And you might--maybe that's possible.
00:19:22.000 | I'm happy in the question period if people want to debate this.
00:19:25.000 | My sense is no.
00:19:27.000 | I think that it's not--when I say no,
00:19:29.000 | I don't mean, like, it can't happen or it won't happen.
00:19:32.000 | What I mean is the highest value,
00:19:34.000 | the highest expected route right now
00:19:36.000 | is to take this more science-based
00:19:38.000 | reverse engineering approach, and that at least
00:19:40.000 | if you follow the current trajectory
00:19:42.000 | that industry incentives especially optimize for,
00:19:45.000 | it's not even really trying to take us to these things.
00:19:48.000 | So think about, for example, a case study
00:19:50.000 | of visual intelligence that is in some ways
00:19:52.000 | as pattern recognition very much of a success.
00:19:54.000 | It's, again, been mostly driven by industry.
00:19:56.000 | It's something that if you read in the news
00:19:59.000 | or even play around with in certain publicly available
00:20:02.000 | data sets, feels like we've made great progress.
00:20:04.000 | And this is an aspect of visual intelligence,
00:20:07.000 | which is sometimes called image captioning
00:20:09.000 | or mapping images to text.
00:20:12.000 | You know, basically there's been a bunch of systems.
00:20:15.000 | Here's a couple of press releases.
00:20:17.000 | This one's about Google.
00:20:19.000 | Google's AI can now capture images
00:20:21.000 | almost as well as humans.
00:20:23.000 | Here's one about Microsoft.
00:20:25.000 | A couple of years ago, I think, there were something
00:20:27.000 | like eight papers all released onto archive
00:20:29.000 | around the same time from basically all the major
00:20:31.000 | industry computer vision groups as well as a couple
00:20:33.000 | of academic partners, which all driven
00:20:36.000 | by basically the same data set produced
00:20:38.000 | by some Microsoft researchers and other collaborators,
00:20:41.000 | trained a combination of deep convolutional neural networks,
00:20:44.000 | state-of-the-art visual pattern recognition,
00:20:47.000 | with recurrent neural networks, which had recently
00:20:49.000 | been developed for basically kinds of neural
00:20:51.000 | statistical language modeling, glued them together
00:20:54.000 | and produced a system which made very impressive results
00:20:57.000 | in a big training set and a held-out test set
00:21:00.000 | where the goal was to take an image and write
00:21:02.000 | a sentence-like, a short sentence caption
00:21:05.000 | that would seem like the kind of way
00:21:07.000 | a human would describe that image.
00:21:09.000 | And these systems surpassed human-level accuracy
00:21:12.000 | on the held-out test set from a big training set.
00:21:15.000 | But what you can see when you really dig into these things
00:21:17.000 | is there's often a lot of what I would call
00:21:19.000 | data set overfitting.
00:21:21.000 | It's not overfitting to the training set,
00:21:23.000 | but it's overfitting to whatever are the particular
00:21:25.000 | characteristics of this data set,
00:21:27.000 | wherever it came from, certain set of photographs
00:21:29.000 | and certain ways of captioning them,
00:21:32.000 | which even a big data set, it's not about quantity.
00:21:35.000 | It's more about the quality, the nature
00:21:37.000 | of what people are doing.
00:21:39.000 | So one way to test this system is to apply it
00:21:42.000 | to what seems like basically the same problem,
00:21:45.000 | but not within a certain curated or built data set.
00:21:49.000 | And there's a convenient Twitter bot that lets you do this.
00:21:52.000 | So there's something called the PickDeskBot,
00:21:54.000 | which takes one of the state-of-the-art
00:21:56.000 | industry AI captioning systems, a very good one.
00:21:58.000 | Again, this is not meant to--I'm not trying
00:22:00.000 | to critique these systems for what they're trying to do.
00:22:02.000 | I'm just trying to point out what they don't really
00:22:04.000 | even try to do.
00:22:06.000 | So this takes the Microsoft CaptionBot,
00:22:08.000 | and just every couple of hours takes a random image
00:22:10.000 | from the web, captions it, and uploads the results to Twitter.
00:22:13.000 | And a couple of months ago, when I prepared
00:22:15.000 | a first version of this talk, I just took a few days
00:22:18.000 | in the life of this Twitter bot.
00:22:20.000 | I didn't take every single image, but I took, you know,
00:22:22.000 | most of the images in a way that was meant
00:22:24.000 | to be representative of the successes
00:22:26.000 | and the kinds of failures that such a system will make.
00:22:28.000 | So we can go through this, and it's a little bit
00:22:30.000 | entertaining and I think quite informative.
00:22:32.000 | So here's just a somewhat random sample
00:22:35.000 | of a few days in the life of one of these CaptionBots.
00:22:39.000 | So here we have a picture of a person holding--
00:22:42.000 | fortunately, my screen is very small here,
00:22:44.000 | and I can't read up there, so maybe you'll have to tell me
00:22:46.000 | what it says--but a person holding a cell phone.
00:22:48.000 | I guess I'll just read along with you.
00:22:50.000 | So we have a person holding a cell phone.
00:22:52.000 | Well, it's not a person holding a cell phone,
00:22:54.000 | but it's kind of close.
00:22:56.000 | It's a person holding some kind of machine.
00:22:58.000 | I don't even know what that is,
00:23:00.000 | but it's some kind of musical instrument, right?
00:23:02.000 | So that's a mixed success or failure.
00:23:04.000 | Here's a person on a field playing football.
00:23:06.000 | I would call that an A result, maybe even A+.
00:23:10.000 | Here's a group of people standing on top of a mountain.
00:23:13.000 | So less good. There's a mountain,
00:23:15.000 | but as far as I can tell, there's no people.
00:23:17.000 | But these systems like to see people
00:23:19.000 | because in the data set they were trained on,
00:23:21.000 | there's a lot of people, and people often talk about people.
00:23:24.000 | And the fact that you can appreciate
00:23:26.000 | both what I said and why it's funny--
00:23:28.000 | there you did some of my cognitive activities
00:23:30.000 | that this system is not even trying to do.
00:23:32.000 | Okay, here we've got a building with a cake.
00:23:34.000 | I'll go through these fast.
00:23:36.000 | A building with a cake, a large stone building with a clock tower.
00:23:39.000 | I think that's pretty good. I'd give that like a B+.
00:23:41.000 | There's no clock, but it's plausibly right.
00:23:43.000 | There might be a clock in there.
00:23:45.000 | There's definitely something like that.
00:23:47.000 | Here's a truck parked on the side of a building.
00:23:49.000 | I don't know, maybe a B-.
00:23:51.000 | There is a car on the side of a building,
00:23:53.000 | but it's not a truck, and it doesn't seem like the main thing in the image.
00:23:56.000 | Here's a necklace made of bananas.
00:23:58.000 | (laughter)
00:24:00.000 | A large ship in the water.
00:24:02.000 | This is pretty good. I'd give this like an A- or B+,
00:24:04.000 | because there is a ship in the water, but it's not very large.
00:24:07.000 | It's really more of like a tugboat or something.
00:24:09.000 | Here's a sign sitting on the grass.
00:24:11.000 | You know, in some sense, that's great.
00:24:13.000 | No, but in another sense, it's really missing
00:24:15.000 | what's actually interesting and important and meaningful to humans.
00:24:18.000 | Here's a garden that's in the dirt.
00:24:22.000 | (laughter)
00:24:24.000 | A pizza sitting on top of a building.
00:24:26.000 | A small house with a red brick building.
00:24:28.000 | I don't know what kind of weird way of saying it.
00:24:30.000 | A vintage photo of a pond. That's good.
00:24:32.000 | They like vintage photos.
00:24:34.000 | A group of people that are standing in the grass near a bridge.
00:24:36.000 | Again, there's two people, and there's some grass, and there's a bridge,
00:24:39.000 | but it's really not what's going on.
00:24:41.000 | A person in a yard. Okay, kind of.
00:24:43.000 | A group of people standing on top of a boat.
00:24:45.000 | There's a boat, there's a group of people, they're standing,
00:24:48.000 | but again, the sentence that you see is more based on a bias
00:24:51.000 | of what people have said in the past about images that are only vaguely like this.
00:24:55.000 | A clock tower is lit up at night.
00:24:57.000 | That's actually, I think, pretty impressive.
00:24:59.000 | A large clock mounted to the side of a building.
00:25:01.000 | A little bit less so.
00:25:03.000 | A snow-covered field. Very good.
00:25:05.000 | A building with snow on the ground. A little bit less good.
00:25:07.000 | There's no snow. It's white.
00:25:09.000 | Some people who--I don't know them, but I bet that's probably right,
00:25:12.000 | because identifying faces and recognizing people who are famous
00:25:15.000 | because they won medals in the Olympics,
00:25:17.000 | probably I would trust current pattern recognition systems to get that.
00:25:21.000 | A painting of a vase in front of a mirror.
00:25:23.000 | Less good.
00:25:25.000 | I think there's a guy in there, but we didn't get him.
00:25:27.000 | A person walking in the rain.
00:25:29.000 | Again, there is sort of a person, and there's some puddles,
00:25:32.000 | but a group of stuffed animals.
00:25:35.000 | [laughter]
00:25:37.000 | A car parked in a parking lot. That's good.
00:25:39.000 | A car parked in front of a building.
00:25:41.000 | Less good.
00:25:43.000 | A plate with a fork and knife. A clear blue sky.
00:25:45.000 | Okay, so you get the idea.
00:25:47.000 | Again, if you actually go and play with this system,
00:25:50.000 | partly because I think--my friends at Microsoft told me they've improved it some.
00:25:54.000 | This is partly for entertainment values.
00:25:56.000 | I chose what also would be the funnier example.
00:25:59.000 | I want to be quite honest about this.
00:26:01.000 | I'm not trying to take away what are impressive AI technologies,
00:26:05.000 | but I think it's clear that there's a sense of understanding
00:26:08.000 | in any one of these images that it's important to see
00:26:11.000 | that even when it seems to be correct,
00:26:13.000 | if it can make the kind of errors that it makes,
00:26:15.000 | that even when it seems to be correct,
00:26:17.000 | it's probably not doing what you're doing,
00:26:19.000 | and it's probably not even trying to scale towards the dimensions
00:26:22.000 | of intelligence that we think about when we're talking about human intelligence.
00:26:26.000 | Another way to put this--I'm going to show you a really insightful blog post
00:26:29.000 | from one of your other speakers.
00:26:31.000 | In a couple of days, I'm not sure, you're going to have Andrej Karpathy,
00:26:34.000 | who's one of the leading people in deep learning.
00:26:38.000 | This is a really great blog post he wrote a couple of years ago
00:26:41.000 | when he was, I think, still at Stanford.
00:26:43.000 | He got his PhD from Stanford.
00:26:45.000 | He worked at Google a little bit on some early big neural net AI projects there.
00:26:51.000 | He was at OpenAI. He was one of the founders of OpenAI.
00:26:54.000 | Recently, he joined Tesla as their director of AI research.
00:26:58.000 | About five years ago, he was looking at the state of computer vision
00:27:02.000 | from a human intelligence point of view and lamenting how far away we were.
00:27:06.000 | This is the title of his blog post, "The State of Computer Vision and AI."
00:27:10.000 | We are really, really far away.
00:27:12.000 | He took this image, which was a famous image in its own right.
00:27:17.000 | It was a popular image of Obama back when he was president
00:27:20.000 | playing around as he liked to do when he was on tour.
00:27:22.000 | If you take a look at this, you can see you probably all can recognize
00:27:26.000 | the previous president of the United States,
00:27:28.000 | but you can also get the sense of where he is and what's going on.
00:27:31.000 | You might see people smiling, and you might get the sense
00:27:33.000 | that he's playing a joke on someone. Can you see that?
00:27:36.000 | How do you know that he's playing a joke and what that joke is?
00:27:40.000 | As Andrej goes on to talk about in his blog post,
00:27:43.000 | if you think about all the things that you have to really deploy in your mind
00:27:47.000 | to understand that, it's a huge list.
00:27:49.000 | Of course, it starts with seeing people and objects
00:27:51.000 | and maybe doing some face recognition,
00:27:53.000 | but you have to do things like, for example,
00:27:55.000 | notice his foot on the scale and understand enough about how scales work
00:27:59.000 | that when a foot presses down, it exerts force,
00:28:01.000 | that the scale is sensitive.
00:28:03.000 | It doesn't just magically measure people's weight,
00:28:05.000 | but it does that somehow through force.
00:28:07.000 | You have to see who can see that he's doing that
00:28:09.000 | and who cannot see that he's doing that,
00:28:11.000 | in particular the person on the scale,
00:28:13.000 | and why some people can see that he's doing that
00:28:15.000 | and can see that some other people can't see it,
00:28:17.000 | why that makes it funny to them.
00:28:19.000 | Someday we should have machines that can understand this,
00:28:23.000 | but hopefully you can see why the kind of architecture
00:28:28.000 | that I'm talking about would be the building blocks
00:28:31.000 | or the ingredients to be able to get them to do that.
00:28:34.000 | Again, I prepared a version of this talk a few months ago,
00:28:37.000 | and I wrote to Andrej and I said I was going to use this,
00:28:40.000 | and I was curious if he had any reflections on this
00:28:44.000 | and where he thought we were relative to five years ago,
00:28:47.000 | because certainly a lot of progress has been made.
00:28:49.000 | But he said--here's his email.
00:28:51.000 | I hope he doesn't mind me sharing it,
00:28:53.000 | but again, he's a very honest person,
00:28:55.000 | and that's one of the many reasons why he's such an important person right now in AI.
00:28:58.000 | He's both very technically strong and honest about what we can do, what we can't do,
00:29:02.000 | and as he says--what does he say?
00:29:04.000 | "It's nice to hear from you. It's fun you should bring this up.
00:29:06.000 | I was also thinking about writing a return to this."
00:29:09.000 | And in short, basically, I don't believe we've made very much progress.
00:29:12.000 | He points out that in his long list of things that you'd need to understand the image,
00:29:16.000 | we have made progress on some--the ability to, again, detect people
00:29:19.000 | and do face recognition for well-known individuals.
00:29:22.000 | But that's kind of about it.
00:29:24.000 | And he wasn't particularly optimistic that the current route that's being pursued in industry
00:29:28.000 | is anywhere close to solving or even really trying to solve these larger questions.
00:29:34.000 | If we give this image to that caption bot,
00:29:38.000 | what we see is, again, represents the same point.
00:29:41.000 | So here's the caption bot.
00:29:42.000 | It says, "I think it's a group of people standing next to a man in a suit and tie."
00:29:46.000 | So that's right as far as it goes.
00:29:49.000 | It just doesn't go far enough, and the current ideas of build a data set,
00:29:54.000 | train a deep learning algorithm on it, and then repeat
00:29:58.000 | aren't really even, I would venture, trying to get to what we're talking about.
00:30:03.000 | Or here's another--I'll just give you one other example of a couple of photographs
00:30:06.000 | from my recent vacation in a nice, warm, tropical locale,
00:30:11.000 | which I think illustrate ways in which, again, the gap where we have machines
00:30:15.000 | that can, say, beat the world's best at Go,
00:30:19.000 | but can't even beat a child at tic-tac-toe.
00:30:21.000 | Now, what do I mean by that?
00:30:23.000 | Well, of course, we don't even need reinforcement learning or deep learning
00:30:26.000 | to build a machine that can win or tie, do optimally in tic-tac-toe.
00:30:31.000 | But think about this.
00:30:32.000 | This is a real tic-tac-toe game, which I saw on the grass outside my hotel.
00:30:37.000 | What do you have to do to look at this and recognize that it's a tic-tac-toe game?
00:30:40.000 | You have to see the objects.
00:30:41.000 | You have to see what's--in some sense, there's a 3-by-3 grid,
00:30:45.000 | but it's only abstract.
00:30:46.000 | It's only delimited by these ropes or strings.
00:30:51.000 | It's not actually a grid in any simple geometric sense.
00:30:55.000 | But yet a child can look at that--and indeed, here's an actual child
00:30:57.000 | who was looking at it--and recognize, "Oh, it's a game of tic-tac-toe,"
00:31:00.000 | and even know what they need to do to win, namely put the X and complete it,
00:31:03.000 | and now they've got three in a row.
00:31:04.000 | That's literally child's play.
00:31:07.000 | You show this sort of thing, though, to one of these image-understanding
00:31:10.000 | caption bots, and I think it's a close-up of a sign.
00:31:14.000 | Again, saying that this is a close-up of a sign is not the same thing,
00:31:22.000 | I would venture, as a cognitive or computational activity
00:31:25.000 | that's going to give us what we need to, say, recognize the object,
00:31:28.000 | to recognize it as a game, to understand the goal,
00:31:30.000 | and how to plan to achieve those goals.
00:31:32.000 | Whereas this kind of architecture is designed to try to do
00:31:35.000 | all of these things, ultimately.
00:31:37.000 | I bring in these examples of games or jokes to really show where perception
00:31:43.000 | goes to cognition, all the way up to symbols.
00:31:47.000 | So to get objects and forces and mental states, that's the cognitive core,
00:31:52.000 | but to be able to get goals and plans and what do I do
00:31:56.000 | or how do I talk about it, that's symbols.
00:31:59.000 | Here's another way into this, and it's one that also motivates, I think,
00:32:02.000 | a lot of really good work on the engineering side,
00:32:04.000 | and a lot of our interest in the science side, is think about robotics
00:32:08.000 | and think about what do you have to do to --
00:32:11.000 | what does the brain have to be like to control the body?
00:32:14.000 | So again, you're going to hear from, shortly, I think maybe it's next week,
00:32:18.000 | from Mark Raybert, who's one of the founders of Boston Dynamics,
00:32:22.000 | which is one of my favorite companies anywhere.
00:32:25.000 | They're without doubt the leading maker of humanoid robots,
00:32:29.000 | legged locomoting robots in industry.
00:32:32.000 | They have all sorts of other really cool robots, robots like dogs,
00:32:36.000 | robots that have -- I think you'll even get to see a live demonstration
00:32:40.000 | of one of these robots. It's really awesome, impressive stuff.
00:32:44.000 | But what about the minds and brains of these robots?
00:32:47.000 | Well, again, if you ask Mark, ask them how much of human-like cognition
00:32:51.000 | do they have in their robots, and I think he would say very little.
00:32:54.000 | In fact, we have asked him that, and he would say very little.
00:32:57.000 | He has said very little.
00:32:59.000 | He's actually one of the advisors of our center, and I think in many ways
00:33:02.000 | we're very much on the same page. We both want to know,
00:33:05.000 | how do you build the kind of intelligence that can control these bodies
00:33:08.000 | like the way a human does?
00:33:11.000 | Here's another example of an industry robotics effort.
00:33:13.000 | This is Google's Arm Farm, where they've got lots of robot arms,
00:33:16.000 | and they're trying to train them to pick up objects using various kinds
00:33:19.000 | of deep learning and reinforcement learning techniques.
00:33:22.000 | I think it's one approach. I just think it's very, very different
00:33:25.000 | from the way humans learn to, say, control their body and manipulate objects.
00:33:29.000 | You can see that in terms of things that go back to what you were saying
00:33:32.000 | when you were introducing me. Think about how quickly we learn things.
00:33:35.000 | Here you have the Arm Farmers trying to generate, effectively,
00:33:39.000 | maybe if not infinite, but hundreds of thousands, millions of examples
00:33:43.000 | of reaches and pickups of objects, even with just a single gripper.
00:33:47.000 | Yet a child, who in some ways can't control their body
00:33:50.000 | nearly as well as robots can be controlled at the low level,
00:33:54.000 | is able to do so much more.
00:33:56.000 | I'll show you two of my favorite videos from YouTube here,
00:33:59.000 | which motivate some of the research that we're doing.
00:34:01.000 | The one on the left is a one-and-a-half-year-old,
00:34:03.000 | and the other one's a one-year-old.
00:34:05.000 | Just watch this one-and-a-half-year-old here doing a popular activity
00:34:08.000 | for many kids. Is it playing?
00:34:13.000 | Do you see a video up there?
00:34:18.000 | Hmm. Okay, there we go.
00:34:20.000 | Okay, so he's doing this stacking cup activity.
00:34:24.000 | He's stacking up cups to make a tall tower.
00:34:27.000 | He's got a stack of three, and what you can see
00:34:29.000 | from the first part of this video is it looks like he's trying
00:34:32.000 | to make a second stack that he's trying to pick up at once.
00:34:35.000 | Basically, he's trying to make a stack of two
00:34:38.000 | that'll go on the stack of three.
00:34:40.000 | And he's trying to debug his plan, because it got a little bit stuck here.
00:34:44.000 | And think about it. I mean, again, if you know anything
00:34:48.000 | about robots manipulating objects, even just what he just did,
00:34:51.000 | no robot can decide to do that and actually do it.
00:34:54.000 | At some point, he's almost got it. It's a little bit tricky,
00:34:57.000 | but at some point he's going to get that stack of two.
00:35:00.000 | He realizes he has to move that object out of the way.
00:35:02.000 | Look at what he just did. Move it out of the way,
00:35:04.000 | use two hands to pick it up.
00:35:06.000 | And now he's got a stack of two on a stack of three,
00:35:08.000 | and suddenly, you know, sub-goal completed.
00:35:10.000 | He's now got a stack of five.
00:35:12.000 | He's got a stack of 10, because he knows he accomplished
00:35:14.000 | a key waypoint along the way to his final goal.
00:35:17.000 | That's a kind of early symbolic cognition, right?
00:35:19.000 | To understand that I'm trying to build a tall tower,
00:35:22.000 | but a tower is made up of little towers.
00:35:24.000 | And you can take a tower and put it on top of another tower,
00:35:27.000 | or stack a stack on a stack, and you have a bigger stack.
00:35:30.000 | So think about how he goes from bottom-up perception
00:35:33.000 | to the objects, to the physics needed to manipulate the objects,
00:35:36.000 | to the ability to make even those early kinds of symbolic plans.
00:35:39.000 | At some point, he keeps doing this.
00:35:41.000 | He puts another stack on there.
00:35:43.000 | I'll just jump to the end.
00:35:45.000 | Oops, sorry, you missed--sorry.
00:35:47.000 | He gets really excited,
00:35:49.000 | and he gives himself another big hand, but falls over.
00:35:52.000 | OK, again, Boston Dynamics now has robots
00:35:55.000 | that could pick themselves up after that.
00:35:57.000 | That's really impressive, again.
00:35:59.000 | But all the other stuff to get to that point,
00:36:01.000 | we don't really know how to do in a robotic setting.
00:36:03.000 | Or think about this baby here. This is a younger baby.
00:36:06.000 | This is one of the Internet's very most popular videos
00:36:09.000 | because it features a baby and a cat.
00:36:11.000 | (laughter)
00:36:13.000 | But the baby's doing something interesting.
00:36:15.000 | He's got the same cups, but he's decided--
00:36:17.000 | he's, again, decided to try a new thing.
00:36:19.000 | So think about creativity.
00:36:21.000 | He's decided that his goal
00:36:23.000 | is to stack up cups on the back of a cat, I guess.
00:36:25.000 | He's asking, "How many cups can I fit on the back of a cat?"
00:36:28.000 | Well, three. Let's see, can I fit more?
00:36:31.000 | Let's try another one.
00:36:33.000 | OK, well, he can't fit more than three, it turns out.
00:36:35.000 | And then he-- then, ugh, it's not working.
00:36:37.000 | So he changes his goal.
00:36:39.000 | Now his goal appears to be to get the cups
00:36:41.000 | on the other side of the cat.
00:36:43.000 | Now watch that part when he reaches back behind him there.
00:36:45.000 | That's--I'll just pause it there for a moment.
00:36:47.000 | So when he just reached back there,
00:36:49.000 | that's a particularly striking moment in the video.
00:36:51.000 | It shows a very strong form
00:36:53.000 | of what we call in cognitive science object permanence.
00:36:56.000 | That's the idea that you represent objects
00:36:59.000 | as these permanent, enduring entities in the world,
00:37:01.000 | even when you can't see them.
00:37:03.000 | In this case, he hadn't seen or touched that object behind him
00:37:06.000 | for, like, at least a minute, right?
00:37:08.000 | Maybe much longer, I don't know.
00:37:10.000 | And yet he still knew it was there,
00:37:12.000 | and he was able to incorporate it in his plan.
00:37:14.000 | There's a moment before that when he's about to reach for it,
00:37:16.000 | but then he sees this other one, right?
00:37:18.000 | And it's only when he's now exhausted all the other objects here
00:37:20.000 | that he can see, he's like, "OK, now time to get this object
00:37:22.000 | and bring it into play," right?
00:37:24.000 | So think about what has to be going on in his brain
00:37:26.000 | for him to be able to do that, right?
00:37:28.000 | That's like the analog of you understanding what's behind you.
00:37:31.000 | It's not that these things are impossible to capture machines.
00:37:33.000 | Far from it.
00:37:35.000 | It's just that training a deep neural network
00:37:37.000 | or any kind of pattern recognition system,
00:37:39.000 | we don't think is going to do it.
00:37:41.000 | But we think by reverse engineering how it works in the brain,
00:37:43.000 | we might be able to do it.
00:37:45.000 | I believe we can do it.
00:37:47.000 | It's not just humans that do this kind of activity.
00:37:49.000 | Here's a couple of, again, rather famous videos.
00:37:51.000 | You can watch all of these on YouTube.
00:37:53.000 | Crows are famous object manipulators and tool users,
00:37:55.000 | but also orangutans, other primates, rodents.
00:37:59.000 | We can watch--here, let me pause this one for a second.
00:38:02.000 | If we watch this orangutan here, he's got a bunch of big Legos,
00:38:05.000 | and over the course of this video,
00:38:07.000 | he's building up a stack of Legos.
00:38:11.000 | It's really quite impressive.
00:38:13.000 | I'm just jumping to the end.
00:38:17.000 | There's actually some controversy out there
00:38:19.000 | of whether this video is a fake.
00:38:21.000 | But the controversy isn't about--
00:38:23.000 | it's not like whether it was, I don't know,
00:38:25.000 | done with computer animation.
00:38:27.000 | Some people think the video was actually filmed backwards,
00:38:29.000 | that a human built up the stack,
00:38:31.000 | and the orangutan just slowly disassembled it piece by piece.
00:38:33.000 | And it turns out it's remarkably hard to tell
00:38:35.000 | whether it's played forward or backwards in time,
00:38:37.000 | and people have argued over little details.
00:38:39.000 | Because it would be quite impressive
00:38:41.000 | if an orangutan actually was able to build up
00:38:43.000 | this really impressive stack of Legos.
00:38:45.000 | But I would submit that it would be almost as impressive
00:38:47.000 | if he disassembled it.
00:38:49.000 | Think about the activity.
00:38:51.000 | I mean, if I wanted to disassemble that,
00:38:53.000 | the easiest thing to do would just be to knock it over.
00:38:55.000 | But to piece by piece disassemble it,
00:38:57.000 | even if it's played backwards like this,
00:38:59.000 | that's still a really impressive act
00:39:01.000 | of symbolic planning on physical objects.
00:39:03.000 | Or here you've got this famous mouse.
00:39:05.000 | This you can find on the internet
00:39:07.000 | under the "Mouse vs. Cracker" video.
00:39:09.000 | And what you'll see here over the course of this video
00:39:11.000 | is a mouse valiantly and mostly hopelessly
00:39:13.000 | struggling with a cracker
00:39:15.000 | that they're hoping to bring back to their nest.
00:39:17.000 | I guess it's a very appealing big meal.
00:39:19.000 | And at some point after just trying to get it over
00:39:21.000 | with, it's actually able to do it.
00:39:23.000 | And at some point after just trying to get it
00:39:25.000 | over the wall,
00:39:27.000 | at some point the mouse just gives up
00:39:29.000 | because it's just never going to happen.
00:39:31.000 | And he just goes away.
00:39:33.000 | Except that because even mouses can dream,
00:39:35.000 | or mice can dream,
00:39:37.000 | at some point he decides, "Okay, I'm just going to come back
00:39:39.000 | for one more try."
00:39:41.000 | And he tries one more time, and this time valiantly gets it over.
00:39:43.000 | Isn't that very impressive?
00:39:45.000 | Congratulations, mouse.
00:39:47.000 | You can clap for me at the end, or clap for whoever later.
00:39:49.000 | But I want to applaud the mouse there
00:39:51.000 | every time I see that.
00:39:53.000 | But again, think what had to be going on in his brain
00:39:55.000 | to be able to do that.
00:39:57.000 | It's a crazy thing, and yet he formulated
00:39:59.000 | the goal and was able to achieve it.
00:40:01.000 | I'll just show one more video
00:40:03.000 | that is really more about science.
00:40:05.000 | These other ones are, some of them actually were
00:40:07.000 | from scientific experiments.
00:40:09.000 | But this is one that motivates a lot of the science
00:40:11.000 | that I do, and to me it sets up
00:40:13.000 | a grand cognitive science challenge
00:40:15.000 | for AI and robotics.
00:40:17.000 | It's from an experiment with humans, again,
00:40:19.000 | 18-month-olds or 1-1/2-year-olds.
00:40:21.000 | The kids in this experiment were the same age as
00:40:23.000 | the first baby I showed you, the one who did the stacking.
00:40:25.000 | And 18 months is really a very
00:40:27.000 | good age to study if you're
00:40:29.000 | interested in intelligence, for reasons we can talk
00:40:31.000 | about later if you're interested.
00:40:33.000 | This is from a very famous experiment done
00:40:35.000 | by two psychologists, Felix Wernicken
00:40:37.000 | and Michael Tomasello.
00:40:39.000 | It was studying the spontaneous helping
00:40:41.000 | behavior of young children.
00:40:43.000 | It also contrasted humans and chimps.
00:40:45.000 | The punchline is that chimps sometimes
00:40:47.000 | do things that are kind of like what this human did,
00:40:49.000 | but not nearly as reliably or
00:40:51.000 | as flexibly.
00:40:53.000 | I'll show you a particular
00:40:55.000 | unusual situation
00:40:57.000 | where human kids had relatively
00:40:59.000 | little trouble figuring out what to do
00:41:01.000 | or even whether they should do it, whereas
00:41:03.000 | basically no chimp did what you're going to see
00:41:05.000 | humans sometimes doing here.
00:41:07.000 | The experimenter in this movie,
00:41:09.000 | and I'll turn on the sound here if you can hear it,
00:41:11.000 | the experimenter is the tall guy,
00:41:13.000 | and the participant is
00:41:15.000 | the little kid in the corner.
00:41:17.000 | There's
00:41:19.000 | sound but no words, right?
00:41:21.000 | And at some point he
00:41:23.000 | stops and then the kid just does whatever they want
00:41:25.000 | to do. So watch what he does. He goes over,
00:41:27.000 | he opens the cabinet,
00:41:29.000 | looks inside, then he
00:41:31.000 | steps back and he looks up
00:41:33.000 | at Felix and then looks down,
00:41:35.000 | and then the action is completed.
00:41:37.000 | I want you to watch it one more
00:41:39.000 | time and think about what's got to be going inside
00:41:41.000 | the kid's head to understand this.
00:41:43.000 | So it seems like
00:41:45.000 | what it looks like to us is the kid figured out that this
00:41:47.000 | guy needed help and helped him. And the paper
00:41:49.000 | is full of many other situations like this.
00:41:51.000 | This is just one. But the key idea
00:41:53.000 | is that the situation is somewhat novel.
00:41:55.000 | People have seen people holding books and
00:41:57.000 | opening cabinets, but probably it's
00:41:59.000 | very rare to see this kind of situation
00:42:01.000 | exactly. It's different in some important
00:42:03.000 | details from what you might have seen before.
00:42:05.000 | And there's other ones in there that are really truly novel because
00:42:07.000 | they just made up a machine right there.
00:42:09.000 | But somehow he has to understand
00:42:11.000 | causally from the way the guy
00:42:13.000 | is banging the books against the thing.
00:42:15.000 | It's sort of both a symbol, but it's also
00:42:17.000 | somehow he's got to understand what he
00:42:19.000 | can do and what he can't do, and then what
00:42:21.000 | the kid can do to help.
00:42:23.000 | I'll show this again, but really just
00:42:25.000 | watch. The main part I want you to see
00:42:29.000 | I'll just
00:42:31.000 | sort of skip ahead.
00:42:33.000 | So watch this part
00:42:35.000 | here. Let's say I'll just jump.
00:42:37.000 | Right now he's about
00:42:39.000 | to look up. He looks up and makes
00:42:41.000 | eye contact, and then his eyes look down.
00:42:43.000 | So again,
00:42:45.000 | he looks up,
00:42:47.000 | he looks up, and then a saccade, a
00:42:49.000 | sudden rapid eye movement down, down
00:42:51.000 | to his hands, up, down.
00:42:53.000 | Again, that's this brain OS
00:42:55.000 | in action. He's making one glance,
00:42:57.000 | small glance, at
00:42:59.000 | the big guy's eyes to make
00:43:01.000 | eye contact, to see, to get a signal,
00:43:03.000 | did I understand what you
00:43:05.000 | wanted, and did you register
00:43:07.000 | that joint attention. And then
00:43:09.000 | he makes a prediction about what the guy's going to do,
00:43:11.000 | so he looks right down. He doesn't just
00:43:13.000 | look around randomly. He looks right down
00:43:15.000 | to the guy's hands to track the
00:43:17.000 | action that he expects to see happening.
00:43:19.000 | If I did the right thing to help you, then I expect
00:43:21.000 | you're going to put the books there.
00:43:23.000 | So you can see these things happening, and we
00:43:25.000 | want to know what's going on inside the mind that guides
00:43:27.000 | all of that.
00:43:29.000 | So that's this sort of big scientific agenda
00:43:31.000 | that we're working on over the next few years,
00:43:33.000 | where we think some kind of human
00:43:35.000 | understanding of human
00:43:37.000 | intelligence in scientific terms could
00:43:39.000 | lead to all sorts of AI chaos.
00:43:41.000 | In particular, suppose we could build
00:43:43.000 | a robot that could do what this kid and many other
00:43:45.000 | kids in these experiments do, to say, "Help you
00:43:47.000 | out around the house without having to be programmed
00:43:49.000 | or even really instructed, just to kind of
00:43:51.000 | get a sense. Oh yeah, you need a hand with that?
00:43:53.000 | Sure, let me help you out."
00:43:55.000 | Even 18-month-olds will do that. Sometimes
00:43:57.000 | not very reliably or effectively. Sometimes
00:43:59.000 | they'll try to help and really do the opposite.
00:44:01.000 | But imagine if you could
00:44:03.000 | take the flexible
00:44:05.000 | understanding of humans' actions,
00:44:07.000 | goals, and so on, and make those
00:44:09.000 | reliable engineering technology. That would be
00:44:11.000 | very useful. And it would also be
00:44:13.000 | related to, say, machines that you could actually
00:44:15.000 | start to talk to and trust in some
00:44:17.000 | ways, that shared understanding.
00:44:19.000 | So how are we going to do this? Well, let me spend
00:44:21.000 | the rest of the time talking about how we try
00:44:23.000 | to do this. Some of the
00:44:25.000 | technology that we're building both in our
00:44:27.000 | group and more broadly to try to make
00:44:29.000 | these kinds of architectures real.
00:44:31.000 | And I'll talk about two or
00:44:33.000 | three technical ideas. Again, not in any
00:44:35.000 | detail. One
00:44:37.000 | is the idea of a probabilistic program.
00:44:39.000 | So this is a kind of
00:44:41.000 | a... think of it as
00:44:43.000 | a computational
00:44:45.000 | abstraction that we can use to capture
00:44:47.000 | the common sense knowledge of this core
00:44:49.000 | cognition. So when I say we have an intuitive
00:44:51.000 | understanding of physical objects and people's goals,
00:44:53.000 | how do I build a model
00:44:55.000 | of that model you have in the head?
00:44:57.000 | Probabilistic programs, a little bit more technically,
00:44:59.000 | are... one way to understand
00:45:01.000 | them is as a generalization of Bayesian
00:45:03.000 | networks or other kinds of directed graphical
00:45:05.000 | models, if you know those.
00:45:07.000 | But where instead of defining a probability
00:45:09.000 | model on a graph,
00:45:11.000 | you define it on a program.
00:45:13.000 | And thereby
00:45:15.000 | have access to a much more
00:45:17.000 | expressive toolkit of knowledge representation.
00:45:19.000 | So data structures, other kinds
00:45:21.000 | of algorithmic tools for representing knowledge.
00:45:23.000 | But you still have access
00:45:25.000 | to the ability to do probabilistic inference,
00:45:27.000 | like in a graphical model,
00:45:29.000 | but also causal inference in a
00:45:31.000 | directed graphical model. So for those of you
00:45:33.000 | who know about graphical models, that might make some
00:45:35.000 | sense to you. But just more broadly, what this is,
00:45:37.000 | think of this as a toolkit that allows
00:45:39.000 | us to combine several of the best
00:45:41.000 | ideas, not just of the recent
00:45:43.000 | deep learning era, but over... if you look back over
00:45:45.000 | the whole scope of AI as well as
00:45:47.000 | cognitive science, I think there's three or
00:45:49.000 | four ideas, and
00:45:51.000 | more, but definitely like three ideas we can really
00:45:53.000 | put up there that have proven
00:45:55.000 | their worth and have risen
00:45:57.000 | and fallen in terms of... each of these had
00:45:59.000 | ideas when the mainstream of the field thought
00:46:01.000 | this was totally the way to go and every other idea
00:46:03.000 | was obviously a waste of time.
00:46:05.000 | And also had its time when
00:46:07.000 | many people thought it was a waste of time.
00:46:09.000 | And these three big ideas, I would say,
00:46:11.000 | are first of all the idea of symbolic
00:46:13.000 | representation or symbolic languages
00:46:15.000 | for knowledge representation, probabilistic
00:46:17.000 | inference in generative models
00:46:19.000 | to capture uncertainty, ambiguity,
00:46:21.000 | learning from sparse data, and in their
00:46:23.000 | hierarchical setting, learning to learn.
00:46:25.000 | And then, of course,
00:46:27.000 | the recent developments with neural
00:46:29.000 | inspired architectures for pattern recognition.
00:46:31.000 | Each of these
00:46:33.000 | things, each of these ideas, symbolic
00:46:35.000 | languages, probabilistic inference, and
00:46:37.000 | neural networks, has some distinctive strengths
00:46:39.000 | that are real weak points of the other
00:46:41.000 | approaches. So to take one example that I
00:46:43.000 | haven't really talked about here,
00:46:45.000 | people in the... but you mentioned
00:46:47.000 | as an outstanding challenge for neural networks,
00:46:49.000 | transfer learning, or learning
00:46:51.000 | to take knowledge across a number of previous
00:46:53.000 | tasks to transfer to others. This is a real challenge
00:46:55.000 | and has always been a challenge in a neural net.
00:46:57.000 | But it's something that's addressed
00:46:59.000 | very naturally and very scalably
00:47:01.000 | in, for example, a hierarchical Bayesian model.
00:47:03.000 | And if you look at some of the recent attempts,
00:47:05.000 | really interesting attempts within the
00:47:07.000 | deep learning world to try to get kinds of transfer
00:47:09.000 | learning and learning to learn, they're really cool.
00:47:11.000 | But many of them are
00:47:13.000 | in some ways kind of reinventing
00:47:15.000 | within a neural network paradigm, ideas
00:47:17.000 | that people, maybe just 10 or 15
00:47:19.000 | years ago, developed in very sophisticated
00:47:21.000 | ways in, let's say, hierarchical Bayesian
00:47:23.000 | models. And a lot of attempts
00:47:25.000 | to get sort of symbolic
00:47:27.000 | algorithm-like behavior in neural networks
00:47:29.000 | again, are really, you know,
00:47:31.000 | they're very small steps towards something
00:47:33.000 | which is a very mature technology
00:47:35.000 | in computer systems and programming languages.
00:47:37.000 | Probabilistic programs,
00:47:39.000 | I'll just sort of advertise mostly,
00:47:41.000 | are a way to combine the strengths
00:47:43.000 | of all of these approaches, to have
00:47:45.000 | knowledge representations which are as expressive
00:47:47.000 | as anything that anybody ever did in the symbolic
00:47:49.000 | paradigm, that are as flexible
00:47:51.000 | at dealing with uncertainty and sparse
00:47:53.000 | data as anything in the probabilistic paradigm,
00:47:55.000 | but that also can support pattern
00:47:57.000 | recognition tools to be
00:47:59.000 | able to, for example, do very fast,
00:48:01.000 | efficient inference in very complex
00:48:03.000 | scenarios. And there's a number of
00:48:05.000 | -- that's the kind of conceptual framework.
00:48:07.000 | There's a number of actually implemented tools.
00:48:09.000 | I point to here on the slide
00:48:11.000 | a number of probabilistic programming languages
00:48:13.000 | which you can go explore.
00:48:15.000 | For example, there's one that was developed
00:48:17.000 | in our group a few years ago, almost 10 years ago
00:48:19.000 | now, called Church, which was the antecedent
00:48:21.000 | of some of these other languages built on a
00:48:23.000 | functional programming core. So Church is a probabilistic
00:48:25.000 | programming language built on the lambda
00:48:27.000 | calculus, or really in Lisp, basically.
00:48:29.000 | But there are many other
00:48:31.000 | more modern tools, especially
00:48:33.000 | if you are interested in neural networks.
00:48:35.000 | There are tools like, for example, Pyro
00:48:37.000 | or ProbTorch or BayesFlow
00:48:39.000 | that try to combine all
00:48:41.000 | these ideas in a -- or for example,
00:48:43.000 | in here, which is a project of Vakash Mansinghka's
00:48:45.000 | Probabilistic Computing Group.
00:48:47.000 | These are all things which are
00:48:49.000 | just in the very beginning stages, very,
00:48:51.000 | very alpha. But you can find
00:48:53.000 | out more about them online or by writing
00:48:55.000 | to their creators. And I think this is
00:48:57.000 | a very exciting place where
00:48:59.000 | the convergence of a number of different AI
00:49:01.000 | tools are happening.
00:49:03.000 | And this will be absolutely
00:49:05.000 | necessary for making the kind of architecture that
00:49:07.000 | I'm talking about work.
00:49:09.000 | Another key idea, which we've been building
00:49:11.000 | on in our lab, and I
00:49:13.000 | think, again, many people are using some version
00:49:15.000 | of this idea, but maybe a little bit different
00:49:17.000 | from the way we're doing it,
00:49:19.000 | is -- well, the version of this idea
00:49:21.000 | that I like to talk about is what I call the game
00:49:23.000 | engine in the head. So this is the idea
00:49:25.000 | that it's really what the
00:49:27.000 | programs are about. When I talk about probabilistic
00:49:29.000 | programs, I haven't said anything about what
00:49:31.000 | kind of programs we're using. We're just basically --
00:49:33.000 | these probabilistic programming languages at their best
00:49:35.000 | and Church, the language that
00:49:37.000 | was developed by Noah Goodman and Vakash
00:49:39.000 | and others and Dan Roy and our group some
00:49:41.000 | 10 years ago, was intended to be a
00:49:43.000 | Turing-complete probabilistic programming
00:49:45.000 | language. So any probability model
00:49:47.000 | that was computable or for whose
00:49:49.000 | inferences -- conditional inferences -- are computable,
00:49:51.000 | you could represent in these languages.
00:49:53.000 | But that leaves completely open
00:49:55.000 | what I'm actually going to -- what
00:49:57.000 | kind of program I'm going to write to model
00:49:59.000 | the world. And I've been very
00:50:01.000 | inspired in the last few years by
00:50:03.000 | thinking about the kinds of programs that are in modern
00:50:05.000 | video game engines.
00:50:07.000 | So again, probably most of you are familiar with these,
00:50:09.000 | but if you're -- and increasingly they're playing
00:50:11.000 | a role in all sorts of ways in AI. But these are
00:50:13.000 | tools that were developed by the video game industry
00:50:15.000 | to allow a game designer
00:50:17.000 | to make a new game
00:50:19.000 | without having to do most of --
00:50:21.000 | in some sense, most of the hard technical work
00:50:23.000 | from scratch, but rather to focus
00:50:25.000 | on the characters, the world,
00:50:27.000 | the story, the things that are more
00:50:29.000 | interesting for designing a
00:50:31.000 | novel game. In particular,
00:50:33.000 | if we want a player to explore some
00:50:35.000 | new three-dimensional world,
00:50:37.000 | but to have them be able to interact with the world in real
00:50:39.000 | time and to render nice-looking
00:50:41.000 | graphics in real time
00:50:43.000 | in an interactive way as the player moves around
00:50:45.000 | and explores the world. Or if you want to
00:50:47.000 | populate the world with non-player characters that
00:50:49.000 | will behave in an even vaguely intelligent way.
00:50:51.000 | Okay? Game engines give you
00:50:53.000 | tools for doing all of this without having to write
00:50:55.000 | all of graphics from scratch or all
00:50:57.000 | of physics -- the rules of physics
00:50:59.000 | from scratch. So what are called
00:51:01.000 | game physics engines, and in some
00:51:03.000 | sense are a set of principles,
00:51:05.000 | but also hacks from Newtonian mechanics
00:51:07.000 | and other areas of physics that allow you to
00:51:09.000 | simulate plausible-looking physical
00:51:11.000 | interactions in very complex worlds
00:51:13.000 | very approximately, but
00:51:15.000 | very fast. There's also what's called
00:51:17.000 | game AI, which are basically very simple
00:51:19.000 | planning models. So let's say I want to
00:51:21.000 | have an AI in the game that is
00:51:23.000 | like a guard that guards a base, and a
00:51:25.000 | player is going to attack this base. So back in
00:51:27.000 | the old Atari days, like when I was a kid,
00:51:29.000 | the guards would just be like random
00:51:31.000 | things that would fire missiles kind of randomly
00:51:33.000 | in random directions at random times, right?
00:51:35.000 | But let's say you want a guard to be a little intelligent,
00:51:37.000 | so to actually look around and
00:51:39.000 | "Oh, and I see the player," and then to actually start
00:51:41.000 | shooting at you and to even maybe pursue you.
00:51:43.000 | So that requires putting a little AI in
00:51:45.000 | the game, and you do that by having
00:51:47.000 | basically simple agent models in the game.
00:51:49.000 | So what we think,
00:51:51.000 | and some of you might think this is crazy, and some
00:51:53.000 | of you might think this is a very natural idea
00:51:55.000 | -- I get both kinds of reactions -- what
00:51:57.000 | we think is that these tools of
00:51:59.000 | fast approximate renderers,
00:52:01.000 | physics engines, and sort of very simple
00:52:03.000 | kinds of AI planning are an
00:52:05.000 | interesting first approximation
00:52:07.000 | to the kinds of common sense knowledge
00:52:09.000 | representations that evolution has built into
00:52:11.000 | our brains. So when we talk about the
00:52:13.000 | cognitive core, or how
00:52:15.000 | do babies start,
00:52:17.000 | ways in which a baby's brain isn't
00:52:19.000 | a blank slate, one interesting idea
00:52:21.000 | is that it starts with something like
00:52:23.000 | these tools, and then wrapped
00:52:25.000 | inside a framework for probabilistic inference
00:52:27.000 | -- that's what we mean by probabilistic programs --
00:52:29.000 | that can support many activities
00:52:31.000 | of common sense perception and thinking.
00:52:33.000 | So I'll just give you one example
00:52:35.000 | of what we call this intuitive physics
00:52:37.000 | engine. So this is
00:52:39.000 | work that we did in our groups, that Pete Battaglia
00:52:41.000 | and Jess Hamrick started this work
00:52:43.000 | about five years ago now,
00:52:45.000 | where we showed
00:52:47.000 | people, in some sense,
00:52:49.000 | and this is also an illustration of a kind of
00:52:51.000 | experiment that you might do. When I keep talking
00:52:53.000 | about science, like I'll show you now a couple of experiments.
00:52:55.000 | So we would show people
00:52:57.000 | simple physical scenes, like these blocks
00:52:59.000 | world scenes, and ask them to make a number of judgments.
00:53:01.000 | And the model we built does
00:53:03.000 | basically a little bit of probabilistic inference
00:53:05.000 | in a game-style physics engine.
00:53:07.000 | It perceives the physical state and
00:53:09.000 | imagines a few different possible ways
00:53:11.000 | the world could go over the next one or two
00:53:13.000 | seconds, to answer questions like
00:53:15.000 | "Will the stack of blocks fall?" or
00:53:17.000 | "If they fall, how far will they fall?" or "Which
00:53:19.000 | way will they fall?" or "What would happen if
00:53:21.000 | say, one color
00:53:23.000 | of blocks or one material, like the green
00:53:25.000 | stuff, is ten times heavier than the grey
00:53:27.000 | stuff?" or vice versa. "How will that change the
00:53:29.000 | direction of fall?" or "Look at those red and
00:53:31.000 | yellow blocks, some of
00:53:33.000 | which look like they should be falling, but aren't."
00:53:35.000 | So, why? Can you infer
00:53:37.000 | from the fact that they're not falling
00:53:39.000 | that one color block is much heavier than the
00:53:41.000 | other? Or let me show you a sort
00:53:43.000 | of a slightly weird task.
00:53:45.000 | It's like other behavioral experiments.
00:53:47.000 | Sometimes we do weird things
00:53:49.000 | so that we can test ways in which you use
00:53:51.000 | your knowledge that you didn't just
00:53:53.000 | learn from pattern recognition, but
00:53:55.000 | use it to do new kinds of tasks that you'd
00:53:57.000 | never seen before. So here's a task
00:53:59.000 | which, you know, many of you have maybe
00:54:01.000 | seen me talk about these things, so you might have seen
00:54:03.000 | this task, but probably only if you saw me give a talk
00:54:05.000 | around here before. We call this the
00:54:07.000 | red-yellow task, and again, we'll make this one interactive.
00:54:09.000 | So imagine
00:54:11.000 | that the blocks on the table are knocked
00:54:13.000 | hard enough to bump, the table's
00:54:15.000 | bumped hard enough to knock some of the blocks
00:54:17.000 | onto the floor. So you tell me, "Is it more
00:54:19.000 | likely to be red blocks or yellow blocks?" What do you say?
00:54:21.000 | Red. Okay, good.
00:54:23.000 | How about here?
00:54:25.000 | Yellow. Good.
00:54:27.000 | How about here?
00:54:29.000 | Uh-huh. Here?
00:54:31.000 | Here?
00:54:33.000 | Okay.
00:54:35.000 | Here?
00:54:37.000 | Here?
00:54:39.000 | Okay.
00:54:41.000 | So you
00:54:43.000 | just experience for yourself what it's like to
00:54:45.000 | be a subject in one of these experiments. We just
00:54:47.000 | did the experiment here. The data's all captured on
00:54:49.000 | video, sort of, right? Okay.
00:54:51.000 | You can see that
00:54:53.000 | sometimes people were very quick, other times
00:54:55.000 | people were slower. Sometimes there was a lot of
00:54:57.000 | consensus, sometimes there was a little bit less consensus.
00:54:59.000 | Right? That reflects uncertainty.
00:55:01.000 | So again, there's a long history of studying this
00:55:03.000 | scientifically,
00:55:05.000 | that, you know, but you can see
00:55:07.000 | the probabilistic
00:55:09.000 | inference at work. Probabilistic inference over
00:55:11.000 | what? Well, I would say one way to
00:55:13.000 | describe it is over one or a
00:55:15.000 | few short, low-precision simulations
00:55:17.000 | of the physics of these scenes.
00:55:19.000 | So here is what I mean by this. I'm going
00:55:21.000 | to show you a video of a game
00:55:23.000 | engine reconstruction of one of these scenes
00:55:25.000 | that simulates a small bump. So here's
00:55:27.000 | a small bump. Here's that same
00:55:29.000 | scene with the big bump. Okay. Now
00:55:31.000 | notice that at the micro level, different
00:55:33.000 | things happen. But at the cognitive
00:55:35.000 | or macro level that matters for common
00:55:37.000 | sense reasoning, the same thing happened. Namely, all
00:55:39.000 | the yellow blocks went over onto one side of the table
00:55:41.000 | and few or none of the red blocks did.
00:55:43.000 | So it didn't matter which of those simulations
00:55:45.000 | you ran in your head. You'd get the same answer in this
00:55:47.000 | case. This is one that's very easy and high
00:55:49.000 | confidence and quick. Also,
00:55:51.000 | you didn't have to run the simulation for very long.
00:55:53.000 | You only have to run it for a few time steps
00:55:55.000 | like that to see what's going to happen, or similarly here.
00:55:57.000 | You only have to run it for a few time steps.
00:55:59.000 | And it doesn't have to be even very accurate.
00:56:01.000 | Even a fair amount of imprecision
00:56:03.000 | will give you basically the same answer at the
00:56:05.000 | level that matters for common sense. So that's
00:56:07.000 | the kind of thing our model does. It runs a few
00:56:09.000 | low-precision simulations for a few
00:56:11.000 | time steps. But if you take the average of what
00:56:13.000 | happens there and you compare that with people's
00:56:15.000 | judgments, you get results like what I show
00:56:17.000 | you here. This scatter plot shows
00:56:19.000 | on the y-axis the average judgments
00:56:21.000 | of people. On the x-axis, the average judgments
00:56:23.000 | of this model. And it does a pretty good job. It's not
00:56:25.000 | perfect, but the model basically captures
00:56:27.000 | people's graded sense of what's going
00:56:29.000 | on in this scene and many of these others.
00:56:31.000 | Okay?
00:56:33.000 | And it doesn't do it with any learning.
00:56:35.000 | But I'll come back to that in a second. It just does it by
00:56:37.000 | probabilistic reasoning over a game physics simulation.
00:56:39.000 | Now we can use,
00:56:41.000 | and we have used, the same kind of technology
00:56:43.000 | to capture in very simple
00:56:45.000 | forms, really just proofs of concept at this
00:56:47.000 | point, the kind of common sense physical
00:56:49.000 | scene understanding in a child playing
00:56:51.000 | with blocks or other objects
00:56:53.000 | or in what might go on in a young child's
00:56:55.000 | understanding of other people's actions, what we call
00:56:57.000 | the intuitive psychology engine, where
00:56:59.000 | now the probabilistic programs are defined
00:57:01.000 | over these kind of very simple planning
00:57:03.000 | and perception programs. And I won't go
00:57:05.000 | into any details. I'll just point to a couple
00:57:07.000 | of papers that my group played a very
00:57:09.000 | small role in, but we provided some models
00:57:11.000 | which together with some infant researchers,
00:57:13.000 | people working on, both of these are experiments
00:57:15.000 | that were done with
00:57:17.000 | 10 or 12 month infants, so younger
00:57:19.000 | than even some of the babies I showed you before,
00:57:21.000 | but basically like that youngest baby, the one
00:57:23.000 | with the cat. Here's an example
00:57:25.000 | of showing simple physical scenes.
00:57:27.000 | These are moving objects to
00:57:29.000 | 12 month olds where they saw
00:57:31.000 | a few objects bouncing around inside a
00:57:33.000 | gumball machine and after some point
00:57:35.000 | in time, the scene gets occluded.
00:57:37.000 | You'll see the scene is occluded and then after another
00:57:39.000 | period of time, one of the objects will appear at the
00:57:41.000 | bottom. And the question is, is that
00:57:43.000 | the object you expected to see or not?
00:57:45.000 | Is it expected or surprising? The standard
00:57:47.000 | way you study what infants know is by
00:57:49.000 | what's called looking time methods.
00:57:51.000 | Just like an adult, if I show you something that's surprising
00:57:53.000 | you might look longer.
00:57:55.000 | If you're bored, you'll look away.
00:57:57.000 | So you can do that same
00:57:59.000 | kind of thing with infants and by
00:58:01.000 | measuring how long they look at a scene,
00:58:03.000 | you can measure whether you've shown them something surprising
00:58:05.000 | or not.
00:58:07.000 | There are literally hundreds of studies, if not more,
00:58:09.000 | using looking time measures
00:58:11.000 | to study what infants know.
00:58:13.000 | But only with this paper that we
00:58:15.000 | published a few years ago, did we have
00:58:17.000 | a quantitative model where we were able to show
00:58:19.000 | a relation between inverse probability
00:58:21.000 | in this case and surprise. So things which were objectively
00:58:23.000 | lower probability under one
00:58:25.000 | of these probabilistic physics simulations across
00:58:27.000 | a number of different manipulations of
00:58:29.000 | how fast the objects were, where they were
00:58:31.000 | when the scene was occluded, how long the delay was,
00:58:33.000 | various physically relevant variables. How many
00:58:35.000 | objects there were of one type or another.
00:58:37.000 | Infants' expectations connected
00:58:39.000 | with this model. Or another paper that
00:58:41.000 | we published, that one was
00:58:43.000 | done, the experiments there were done by
00:58:45.000 | Arno Tegelus in Luca Bonatti's lab.
00:58:47.000 | Here is a study that was done just
00:58:49.000 | recently by Sherry Liu in Liz
00:58:51.000 | Spelke's lab, they're at Harvard, but they're
00:58:53.000 | partners with us in CBMM,
00:58:55.000 | which was about infants' understanding of goals.
00:58:57.000 | So this is more like, again, understanding of agents
00:58:59.000 | in intuitive psychology, where in, again,
00:59:01.000 | in very simple cartoon scenes,
00:59:03.000 | you show an infant, an agent
00:59:05.000 | that seems to be doing something, like an animated
00:59:07.000 | cartoon character, but it jumps over
00:59:09.000 | a wall, or it rolls up a hill,
00:59:11.000 | or it jumps over a gap. And
00:59:13.000 | the question is, basically, how
00:59:15.000 | much does the agent want the goal that it seems
00:59:17.000 | to be trying to achieve? And what this study
00:59:19.000 | showed, and the models here were
00:59:21.000 | done by Tomer Ullman, was that infants
00:59:23.000 | appeared to be sensitive to the
00:59:25.000 | physical work done by the agent.
00:59:27.000 | The more work the agent did, in the sense
00:59:29.000 | of the integral of force
00:59:31.000 | applied over a path, the more
00:59:33.000 | the infants thought the agent
00:59:35.000 | wanted the goal.
00:59:37.000 | We think of this as representing what we sometimes
00:59:39.000 | call the naive utility calculus.
00:59:41.000 | So the idea that there's a basic calculus
00:59:43.000 | of cost and benefit,
00:59:45.000 | you know, we take actions which
00:59:47.000 | are a little bit costly to achieve goal states
00:59:49.000 | which give us some reward. That's the
00:59:51.000 | most basic way, the oldest way, to think about
00:59:53.000 | rational, intentional action. And it
00:59:55.000 | seems that even 10-month-olds understand some
00:59:57.000 | version of that, where the cost can be measured
00:59:59.000 | in physical terms.
01:00:01.000 | I see I'm running a little bit behind
01:00:03.000 | on time, and I wanted to
01:00:05.000 | leave some time for discussion. So I'll just go
01:00:07.000 | very quickly through a couple of other things,
01:00:09.000 | and happy to stay around at the end
01:00:11.000 | for discussion.
01:00:13.000 | What I
01:00:15.000 | showed you here was the science. Where does the
01:00:17.000 | engineering go? So one thing you can
01:00:19.000 | do with this is, say, build
01:00:21.000 | a machine system that can look not at a little
01:00:23.000 | animated cartoon like these baby experiments,
01:00:25.000 | but a real person doing something. And again,
01:00:27.000 | combine physical cost and
01:00:29.000 | constraints of actions with some understanding
01:00:31.000 | of the agent's utilities. That's the
01:00:33.000 | math of planning
01:00:35.000 | to figure out what they wanted.
01:00:37.000 | So look in this scene here, and
01:00:39.000 | see if you can judge which object
01:00:41.000 | the woman is reaching for. So you can see
01:00:43.000 | there's a grid of
01:00:45.000 | 4x4 objects. There's 16 objects
01:00:47.000 | here, and she's going to be reaching for one of them.
01:00:49.000 | It's going to play in slow motion,
01:00:51.000 | but raise your hand when you know which one she's reaching
01:00:53.000 | for. So just watch and raise your hand when
01:00:55.000 | you know which one she wants.
01:00:57.000 | So most of the hands are up by now.
01:01:02.000 | And notice, I was looking at your
01:01:04.000 | hands, not here, but what happened is
01:01:06.000 | most of the hands were up about the time
01:01:08.000 | when that dashed
01:01:10.000 | line shot up.
01:01:12.000 | That's not human data. You
01:01:14.000 | provided the data. This is our model. So our
01:01:16.000 | model is predicting, more or less, when
01:01:18.000 | you're able to say what her goal was.
01:01:20.000 | It's well before she actually touched the object.
01:01:22.000 | How does the model work? Again, I'll skip the details,
01:01:24.000 | but it does the same kind of
01:01:26.000 | thing that our models of those infants
01:01:28.000 | did. Namely, but in this
01:01:30.000 | case it does it with a full-body model from
01:01:32.000 | robotics. So we use what's called the MuJoCo physics
01:01:34.000 | engine, which is a standard tool
01:01:36.000 | in robotics for planning physically
01:01:38.000 | efficient reaches of, say, a humanoid robot.
01:01:40.000 | And we say, we can give
01:01:42.000 | this planner program
01:01:44.000 | a goal object as input. We can give it each of
01:01:46.000 | the possible goal objects as input and say,
01:01:48.000 | "Plan the most physically efficient action,"
01:01:50.000 | so the one that uses the least energy to get to
01:01:52.000 | that object. And then we can do a Bayesian
01:01:54.000 | inference. This is the probabilistic inference part.
01:01:56.000 | The program is the MuJoCo
01:01:58.000 | planner. But then we can
01:02:00.000 | say, "I want to do Bayesian inference
01:02:02.000 | to work backwards from what I observed,
01:02:04.000 | which was the action, to the input to that
01:02:06.000 | program. What goal was provided as
01:02:08.000 | input to the planner?" And here you can see
01:02:10.000 | the full array of 4x4
01:02:12.000 | possible inputs, and those bars that are
01:02:14.000 | moving up and down, that's the Bayesian posterior
01:02:16.000 | probability of how likely each
01:02:18.000 | of those was to be the goal. And what
01:02:20.000 | you can see is it converges on the right answer,
01:02:22.000 | at least, well, it turns out to be the ground truth
01:02:24.000 | right answer, but it's also the right answer according to what people
01:02:26.000 | think, with about the same kind of
01:02:28.000 | data that people took.
01:02:30.000 | Now you might say, "Well, okay, sure,
01:02:32.000 | if I just wanted to build a system that could detect
01:02:34.000 | what somebody was reaching for, I could generate
01:02:36.000 | a training data set of this sort of
01:02:38.000 | scene and train something up to
01:02:40.000 | analyze patterns of motion." But again,
01:02:42.000 | because the engine in your head actually
01:02:44.000 | does something we think more like this,
01:02:46.000 | it does what we call inverse planning over a
01:02:48.000 | physics model, it can apply to much more
01:02:50.000 | interesting scenes that you haven't really seen
01:02:52.000 | much of before. So take the scene on the left,
01:02:54.000 | where again you see somebody reaching
01:02:56.000 | for one of a 4x4 array of objects,
01:02:58.000 | but what you see is a strange kind of reach.
01:03:00.000 | Can you see why he's doing a strange reach?
01:03:02.000 | Up there, it's a little
01:03:04.000 | small, but you can see that he's
01:03:06.000 | reaching over something, right? It's
01:03:08.000 | actually a pane of glass, right? Do you see that?
01:03:10.000 | And then there's this other guy
01:03:12.000 | helping him, who sees what
01:03:14.000 | he wants and hands him the thing he wants.
01:03:16.000 | So how does the guy
01:03:18.000 | in the foreground see the other guy's goal?
01:03:20.000 | How does he infer his goal
01:03:22.000 | and know how to help him? And then how do we look
01:03:24.000 | at the two of them and figure out who's trying to help
01:03:26.000 | who? Or that in a scene like this one
01:03:28.000 | here, that it's not somebody trying
01:03:30.000 | to help somebody, but rather the opposite.
01:03:32.000 | So here's a model on the left
01:03:34.000 | of how that might work, and we think this is the
01:03:36.000 | kind of model needed to tackle this sort of challenge
01:03:38.000 | here. Basically, it's a model
01:03:40.000 | -- we take this model of
01:03:42.000 | planning, sort of maximal expected utility
01:03:44.000 | planning, which you can run backwards,
01:03:46.000 | but then we recursively nest these
01:03:48.000 | models inside each other. So we say, an agent
01:03:50.000 | is helping another agent. If this agent
01:03:52.000 | is acting, apparently, to us,
01:03:54.000 | seems to be maximizing an expected utility,
01:03:56.000 | that's a positive
01:03:58.000 | function of that agent's expectation
01:04:00.000 | about another agent's expected utility, and that's what it
01:04:02.000 | means to be a helper. Hindering
01:04:04.000 | is sort of the opposite, if one seems to be trying
01:04:06.000 | to lower somebody else's utility.
01:04:08.000 | And we've used these same
01:04:10.000 | kind of models to also describe
01:04:12.000 | infants' understanding of helping and hindering
01:04:14.000 | in a range of scenes. I'll just
01:04:16.000 | say one last word about learning,
01:04:18.000 | because everybody wants to know about learning, and
01:04:20.000 | the key thing here, and it's
01:04:22.000 | definitely part of any picture of AGI,
01:04:24.000 | but the thought I want to leave
01:04:26.000 | you on is really about what learning is about.
01:04:28.000 | It'll be just a few more slides, and then I'll stop, I promise.
01:04:30.000 | None of the models
01:04:32.000 | I showed you so far really did any learning.
01:04:34.000 | They certainly didn't do any task-specific learning.
01:04:36.000 | We set up a probabilistic program
01:04:38.000 | and then we let it do inference. Now that's
01:04:40.000 | not to say that we don't think people learn to do these
01:04:42.000 | things. We do. But the real learning
01:04:44.000 | goes on when you're much younger.
01:04:46.000 | Everything I showed you in basic form
01:04:48.000 | even a one-year-old baby can do.
01:04:50.000 | The basic learning goes on
01:04:52.000 | to support these kinds of abilities.
01:04:54.000 | Not that there isn't learning beyond one year, but
01:04:56.000 | the basic way you learn to, say, solve these physics
01:04:58.000 | problems is what goes on
01:05:00.000 | in the brain of a child
01:05:02.000 | between zero and twelve months. So this
01:05:04.000 | is just an example of some phenomena
01:05:06.000 | that come from the literature on infant cognitive
01:05:08.000 | development. These are very rough timelines.
01:05:10.000 | You can take pictures of this if you like.
01:05:12.000 | This is always a popular slide because it
01:05:14.000 | really is quite inspiring, I think, and I can give you
01:05:16.000 | lots of literature pointers, but I'm
01:05:18.000 | summarizing in very broad strokes with big
01:05:20.000 | error bars what we've learned
01:05:22.000 | in the field of infant cognitive development
01:05:24.000 | about when and how
01:05:26.000 | kids seem to at least come to
01:05:28.000 | certain understanding of basic aspects of physics.
01:05:30.000 | So if you really
01:05:32.000 | want to study how people learn to be intelligent,
01:05:34.000 | a lot of what you have to study are kids
01:05:36.000 | at this age. You have to study what's already
01:05:38.000 | in their brain at zero months and
01:05:40.000 | what they learn and how they learn between four,
01:05:42.000 | six, eight, ten, twelve, and so on, and
01:05:44.000 | on up beyond that.
01:05:46.000 | Now, effectively
01:05:48.000 | what that amounts to, we think, is
01:05:50.000 | if what you're learning is something like
01:05:52.000 | let's say an intuitive game
01:05:54.000 | physics engine to capture these basic
01:05:56.000 | abilities, then what we need, if we're going to
01:05:58.000 | try to reverse engineer that, is what
01:06:00.000 | you might think of as a program learning program.
01:06:02.000 | If your knowledge is in the form of a program,
01:06:04.000 | then you have to have programs that build other programs.
01:06:06.000 | This is what I was talking about at the beginning
01:06:08.000 | about learning as building models of the
01:06:10.000 | world. Or ultimately, if you think
01:06:12.000 | what we start off with is something like a game
01:06:14.000 | engine that can play any game,
01:06:16.000 | then what you have to learn is the program of the game
01:06:18.000 | that you're actually playing, or the many different games
01:06:20.000 | that you might be playing over your life. So think
01:06:22.000 | of learning as like programming the game
01:06:24.000 | engine in your head to fit with your experience
01:06:26.000 | and to fit with the possible
01:06:28.000 | actions that you seem like you can take.
01:06:30.000 | Now this is what you could call the hard problem
01:06:32.000 | of learning if you come to learning from, say, neural
01:06:34.000 | networks or other tools in machine learning.
01:06:36.000 | So what makes machine, makes most
01:06:38.000 | of machine learning go right now, and certainly what
01:06:40.000 | makes neural networks so appealing,
01:06:42.000 | is that you can set up basically a big
01:06:44.000 | function approximator that can approximate
01:06:46.000 | many of the functions you might want to do in
01:06:48.000 | a certain application or task, but
01:06:50.000 | in a way that's end-to-end differentiable
01:06:52.000 | and with a meaningful cost function.
01:06:54.000 | So you can have one of these nice optimization
01:06:56.000 | landscapes, you can compute the gradients and basically
01:06:58.000 | just roll downhill until you get to
01:07:00.000 | an optimal solution. But
01:07:02.000 | if you're talking about learning as something like search
01:07:04.000 | in the space of programs, we don't
01:07:06.000 | know how to do anything like that yet. We don't know
01:07:08.000 | how to set this up as any kind of a nice
01:07:10.000 | optimization problem with any notion of smoothness
01:07:12.000 | or gradients. Rather
01:07:14.000 | what we need is, instead of
01:07:16.000 | learning as like rolling downhill effectively,
01:07:18.000 | a process which just, if you're
01:07:20.000 | willing to wait long enough,
01:07:22.000 | some simple algorithm
01:07:24.000 | will take care of. Think of what we
01:07:26.000 | call the idea of learning as programming.
01:07:28.000 | There's a popular metaphor in
01:07:30.000 | cognitive development called the child as scientist,
01:07:32.000 | which emphasizes children
01:07:34.000 | as active theory builders and children's
01:07:36.000 | play as a kind of casual
01:07:38.000 | experimentation. But this is the
01:07:40.000 | algorithmic complement to that, what we call
01:07:42.000 | the child as coder, or around MIT we'll say
01:07:44.000 | the child as hacker. But the rest of the world
01:07:46.000 | if you say child as hacker, they think of
01:07:48.000 | someone who breaks into your email and steals
01:07:50.000 | your credit card numbers. We all know that hacking
01:07:52.000 | is making your code more awesome.
01:07:54.000 | If your
01:07:56.000 | knowledge is some kind of code, or
01:07:58.000 | library of programs, then learning
01:08:00.000 | is all the ways that a child hacks on
01:08:02.000 | their code to make it more awesome.
01:08:04.000 | More awesome can mean more accurate, but it
01:08:06.000 | can also mean faster, more elegant,
01:08:08.000 | more transportable to other
01:08:10.000 | applications or their tasks, more explainable
01:08:12.000 | to others, maybe just more entertaining.
01:08:14.000 | Children have all
01:08:16.000 | of those goals in learning. And the activities
01:08:18.000 | by which they make their code more awesome
01:08:20.000 | also correspond to many
01:08:22.000 | of the activities of coding.
01:08:24.000 | So think about all the ways on a day
01:08:26.000 | to day basis you might make your code more
01:08:28.000 | awesome.
01:08:30.000 | You might have a big library
01:08:32.000 | of existing functions with some parameters
01:08:34.000 | that you can tune on a data set. That's basically
01:08:36.000 | what you do with backprop or stochastic gradient
01:08:38.000 | descent in training a deep learning system.
01:08:40.000 | But think about all the ways in which you might
01:08:42.000 | actually modify the underlying functions. So write
01:08:44.000 | new code, or take old code from
01:08:46.000 | some other thing and map it over here,
01:08:48.000 | or make a whole new library of code, or refactor
01:08:50.000 | your code to some other
01:08:52.000 | basis that will
01:08:54.000 | work more robustly and be more extensible.
01:08:56.000 | Or transpiling, or compiling,
01:08:58.000 | or even just commenting
01:09:00.000 | your code, or asking someone else
01:09:02.000 | for their code. Again, these are all
01:09:04.000 | ways that we make our code more awesome,
01:09:06.000 | and children's learning has analogs to all of
01:09:08.000 | these that we would want to understand
01:09:10.000 | as an engineer from an algorithmic point of view.
01:09:12.000 | So in our group we've been working
01:09:14.000 | on various early steps towards
01:09:16.000 | this. And again, we don't have anything like
01:09:18.000 | program writing programs at
01:09:20.000 | the level of children's learning algorithms.
01:09:22.000 | But one example of something that we did in
01:09:24.000 | our group, which you might not have thought of being about this,
01:09:26.000 | but it's definitely the AI work we did
01:09:28.000 | that got the most attention
01:09:30.000 | in the last couple of years from our group.
01:09:32.000 | We had this paper that was in science, it was actually
01:09:34.000 | on the cover of science, sort of just
01:09:36.000 | hit the market at the right time
01:09:38.000 | if you like, and it got about 100 times more
01:09:40.000 | publicity than anything else I've ever done,
01:09:42.000 | which is partly a testament to the really great
01:09:44.000 | work that Brendan Lake, who was the first author, did
01:09:46.000 | for his PhD here, but much
01:09:48.000 | more so just about the hunger for AI systems
01:09:50.000 | at the time when we published this in 2015.
01:09:52.000 | And we built a machine system
01:09:54.000 | that, the way we described it, was
01:09:56.000 | doing human level concept learning for
01:09:58.000 | simple, very simple visual
01:10:00.000 | concepts, these handwritten characters in many
01:10:02.000 | of the world's alphabets. For those of you who know
01:10:04.000 | the famous MNIST dataset, the dataset
01:10:06.000 | of handwritten digits 0 through 10,
01:10:08.000 | or through 9, sorry, that drove
01:10:10.000 | so much good research in deep learning
01:10:12.000 | and pattern recognition. It did that
01:10:14.000 | not because Jan Lekun, who put that together,
01:10:16.000 | or Jeff Hinton, who did a lot of work on deep learning
01:10:18.000 | with MNIST, they weren't interested
01:10:20.000 | fundamentally in character recognition,
01:10:22.000 | they saw that as a very simple testbed
01:10:24.000 | for developing more general ideas.
01:10:26.000 | And similarly, we did this work on getting
01:10:28.000 | machines to do a kind
01:10:30.000 | of one-shot learning of generative models
01:10:32.000 | also to develop
01:10:34.000 | more general ideas. We saw this as learning
01:10:36.000 | very simple, little mini, probabilistic
01:10:38.000 | programs. In this case, what are those programs?
01:10:40.000 | They're the programs you use to draw a character.
01:10:42.000 | So ask yourself, how can you look at any one
01:10:44.000 | of these characters and see, in a sense,
01:10:46.000 | how somebody might draw it? The way we tested
01:10:48.000 | this in our system was this little
01:10:50.000 | visual Turing test, where we showed
01:10:52.000 | people one character in a novel alphabet
01:10:54.000 | and we said, "Draw another one."
01:10:56.000 | And then we compared nine people, like say, on the left
01:10:58.000 | and nine samples from our machine,
01:11:00.000 | say, on the right, and we said,
01:11:02.000 | we asked other people, "Could you tell which
01:11:04.000 | was the human drawing another example, or imagining
01:11:06.000 | another example, and which was the machine?"
01:11:08.000 | And people couldn't tell. When I said, "One's on the left,
01:11:10.000 | one's on the right," I don't actually remember.
01:11:12.000 | And on different ones, you can see if you can tell. It's very
01:11:14.000 | hard to tell. Can you tell which is, for each
01:11:16.000 | one of these characters, which new set of
01:11:18.000 | examples were drawn by a human versus a machine?
01:11:20.000 | Here's the right answer.
01:11:22.000 | And probably you couldn't tell.
01:11:24.000 | The way we did this was by assembling
01:11:26.000 | a simple kind of program learning program.
01:11:28.000 | So we basically said, when you draw
01:11:30.000 | a character, you're assembling strokes and
01:11:32.000 | substrokes with goals and subgoals
01:11:34.000 | that produce ink on the page.
01:11:36.000 | And when you see a character, you're working backwards
01:11:38.000 | to figure out, what was the program, the most
01:11:40.000 | efficient program that did that? So you're basically
01:11:42.000 | inverting a probabilistic program,
01:11:44.000 | doing Bayesian inference to the program
01:11:46.000 | most likely to have generated what you saw.
01:11:48.000 | This is one small step,
01:11:50.000 | we think, towards being able to learn
01:11:52.000 | programs, to being able to learn something ultimately
01:11:54.000 | like a whole game engine program.
01:11:56.000 | The last thing I'll leave you with is just a pointer
01:11:58.000 | to sort of work in action. So this
01:12:00.000 | is some work being done by a current PhD
01:12:02.000 | student who works partly with me, but also with
01:12:04.000 | Armando Solar-Lizama in CSAIL.
01:12:06.000 | This is Kevin Ellis. It's an example of
01:12:08.000 | what's now, I think, again, an emerging
01:12:10.000 | exciting area in AI, well
01:12:12.000 | beyond anything that we're doing,
01:12:14.000 | is combining techniques from
01:12:16.000 | where Armando comes from, which is the world of
01:12:18.000 | programming languages, not machine learning or
01:12:20.000 | AI, but tools from programming
01:12:22.000 | languages which can be used to automatically
01:12:24.000 | synthesize code, okay, with
01:12:26.000 | the machine learning toolkit, in this case a kind of
01:12:28.000 | Bayesian minimum description length
01:12:30.000 | idea, to be able to make, again,
01:12:32.000 | what is really one small step towards
01:12:34.000 | machines that can learn programs by
01:12:36.000 | basically trying to efficiently find
01:12:38.000 | the shortest, simplest program which can
01:12:40.000 | capture some data set.
01:12:42.000 | So we think by combining these kinds of tools,
01:12:44.000 | in this case, let's say, from Bayesian inference
01:12:46.000 | over programs with a number of tools
01:12:48.000 | that have been developed in other areas
01:12:50.000 | of computer science that don't look anything
01:12:52.000 | or haven't been considered to be machine learning
01:12:54.000 | or AI, like programming languages,
01:12:56.000 | it's one of the many ways that going
01:12:58.000 | forward we're going to be able to build smarter, more
01:13:00.000 | human-like machines. So just
01:13:02.000 | to end then, what I've tried to tell you
01:13:04.000 | here is, first of all,
01:13:06.000 | identify the ways in which human intelligence
01:13:08.000 | goes beyond pattern recognition
01:13:10.000 | to really all these activities of modeling
01:13:12.000 | the world, okay, to give you a sense
01:13:14.000 | of some of the domains where we can start to study this
01:13:16.000 | in common sense scene understanding, for example,
01:13:18.000 | or something
01:13:22.000 | one-shot learning, for example, like what we were just doing
01:13:24.000 | there, or learning as programming the engine in your
01:13:26.000 | head, okay, and
01:13:28.000 | to give you a sense of some of the technical
01:13:30.000 | tools, probabilistic programs,
01:13:32.000 | program synthesis, game engines, for
01:13:34.000 | example, as well as a little bit of deep learning
01:13:36.000 | that, bringing together, we're starting
01:13:38.000 | to be able to make these things real.
01:13:40.000 | Now, that's the science
01:13:42.000 | agenda and the reverse engineering agenda,
01:13:44.000 | but think about, for those of you who are interested in technology,
01:13:46.000 | what are the many big AI
01:13:48.000 | frontiers that this opens up?
01:13:50.000 | So the one I'm most excited about
01:13:52.000 | is this idea which I've highlighted
01:13:54.000 | here in our big research agenda. This is the one
01:13:56.000 | I'm most excited about to work on for the,
01:13:58.000 | you know, it could be the rest of my career, honestly,
01:14:00.000 | but it's really what is
01:14:02.000 | the oldest and maybe the best dream
01:14:04.000 | of AI researchers of how to
01:14:06.000 | build a human-like intelligence system,
01:14:08.000 | a real AGI system.
01:14:10.000 | It's the idea that Turing proposed when he proposed
01:14:12.000 | the Turing test, or Marvin Minsky proposed
01:14:14.000 | this at different times in his life, or many people have
01:14:16.000 | proposed this, right, which is to build a system
01:14:18.000 | that grows into intelligence the way a human
01:14:20.000 | does, that starts like a baby and learns like
01:14:22.000 | a child, and I've tried to show you how we're
01:14:24.000 | starting to be able to understand those
01:14:26.000 | things. What a baby's mind starts with,
01:14:28.000 | how children actually learn,
01:14:30.000 | and looking forward, we might
01:14:32.000 | imagine that someday we'll be able to build machines
01:14:34.000 | that can do this. I think we can actually start working
01:14:36.000 | on this right now, and
01:14:38.000 | that's something that we're doing in our group.
01:14:40.000 | So if that kind of thing excites you, then
01:14:42.000 | I encourage you to work on it, maybe even with us,
01:14:44.000 | or if any one of these other activities of human
01:14:46.000 | intelligence excite you, I
01:14:48.000 | think taking the kind of science-based
01:14:50.000 | reverse engineering approach that we're doing
01:14:52.000 | and then trying to put that into
01:14:54.000 | engineering practice,
01:14:56.000 | this is not just a
01:14:58.000 | possible route, but I think it's
01:15:00.000 | quite possibly the most valuable route
01:15:02.000 | that you could work on right now
01:15:04.000 | to try to actually achieve at least some kind
01:15:06.000 | of artificial general intelligence,
01:15:08.000 | especially the kind of intelligence
01:15:10.000 | AI system that's going to live
01:15:12.000 | in a human world and interact with humans.
01:15:14.000 | There's many kinds of AI systems that could live in
01:15:16.000 | worlds of data that none of us can understand or will
01:15:18.000 | ever live in ourselves, but if you want to build
01:15:20.000 | machines that can live in our world and interact with
01:15:22.000 | us the way we are used to interacting
01:15:24.000 | with other people, then I think
01:15:26.000 | this is a route that you should consider.
01:15:28.000 | Thank you.
01:15:30.000 | [Applause]
01:15:38.000 | Hi there.
01:15:40.000 | So, earlier in the talk you expressed some skepticism
01:15:42.000 | about whether or not industry
01:15:44.000 | would get us to understanding human-level intelligence.
01:15:46.000 | It seems that there's a couple of trends
01:15:48.000 | that favour industry. One is that
01:15:50.000 | industry is better than academia at accumulating
01:15:52.000 | resources and ploughing back into
01:15:54.000 | the topic, and it seems at the
01:15:56.000 | moment we've got a bit of brain drain going on
01:15:58.000 | from academia into industry,
01:16:00.000 | and that seems like an ongoing trend.
01:16:02.000 | If you look at something like learning to fly
01:16:04.000 | or learning to fly into space,
01:16:06.000 | then it looks like the story is
01:16:08.000 | one of industry kind of taking over
01:16:10.000 | the field and going
01:16:12.000 | off on its own a little bit.
01:16:14.000 | Academics still have a role,
01:16:16.000 | but industry kind of dominates.
01:16:18.000 | Is industry going to overtake the field, do you think?
01:16:20.000 | Well, that's a really good question, and it's got
01:16:22.000 | several good questions packed into one there.
01:16:24.000 | I didn't mean to say,
01:16:26.000 | this wasn't meant to say, "Go academia,
01:16:28.000 | bad industry."
01:16:30.000 | What I tried to say
01:16:32.000 | was the approaches that are
01:16:34.000 | currently getting the most attention in industry
01:16:36.000 | and that are really, because they're really the most
01:16:38.000 | valuable ones right now for the short term,
01:16:40.000 | any industry is really focused on what it can do,
01:16:42.000 | what are the value propositions
01:16:44.000 | on basically a two-year time scale at most.
01:16:46.000 | If you ask, say, Google researchers to take the most
01:16:48.000 | prominent example, that's pretty much
01:16:50.000 | what they'll all tell you.
01:16:52.000 | Maybe things that might
01:16:54.000 | pay off initially in
01:16:56.000 | two years, but maybe take five years or more to really
01:16:58.000 | develop. But if you can't show that it's going to
01:17:00.000 | do something practical for us in two years
01:17:02.000 | in a way that matters for our bottom line,
01:17:04.000 | then it's not really worth doing.
01:17:06.000 | What I'm
01:17:08.000 | talking about is the technologies which right
01:17:10.000 | now industry sees as meeting
01:17:12.000 | that specification. What I'm
01:17:14.000 | saying is right now I think
01:17:16.000 | that's not
01:17:18.000 | where the route is to something like human-like,
01:17:20.000 | but not the most valuable promising
01:17:22.000 | route to human-like kinds of AI
01:17:24.000 | systems. But I hope that
01:17:26.000 | like in the case as you said,
01:17:28.000 | the basic research that we're doing now will be
01:17:30.000 | successful enough that it will
01:17:32.000 | get the attention of industry when the time is right.
01:17:34.000 | I hope
01:17:36.000 | at some point
01:17:38.000 | at least the engineering
01:17:40.000 | side will have to be done in industry,
01:17:42.000 | not just in academia.
01:17:44.000 | But you're also pointing to issues of like brain
01:17:46.000 | drain and other things like that.
01:17:48.000 | These are real issues confronting our community. I think
01:17:50.000 | everybody knows this and I'm sure this will come up
01:17:52.000 | multiple times here, which is
01:17:54.000 | I think we have to find
01:17:56.000 | ways to, even now, to combine
01:17:58.000 | the best of the ideas, the energy
01:18:00.000 | and the resources of academia and industry
01:18:02.000 | if we want to keep doing
01:18:04.000 | basically something interesting.
01:18:06.000 | If we just want to redefine
01:18:08.000 | AI to be whatever people currently
01:18:10.000 | call AI but scaled up,
01:18:12.000 | then fine, forget about it.
01:18:14.000 | Or if we just want to say, let me
01:18:16.000 | and people like me do what we're doing at
01:18:18.000 | what industry would consider a snail's pace on
01:18:20.000 | toy problems, okay, fine.
01:18:22.000 | But if we want to,
01:18:24.000 | if I want to take what I'm doing to
01:18:26.000 | the level that will really be
01:18:28.000 | paying off the level that industry can
01:18:30.000 | appreciate or just that really has technological
01:18:32.000 | impact on a broad scale,
01:18:34.000 | or I think if industry wants
01:18:36.000 | to take what it's doing and really
01:18:38.000 | build machines that are actually intelligent
01:18:40.000 | or machine learning that actually
01:18:42.000 | learns like a person, then I think we need each other
01:18:44.000 | now and not just at some point in the future.
01:18:46.000 | So this is a general challenge for
01:18:48.000 | MIT and for everywhere and for Google.
01:18:50.000 | We just spent a few days
01:18:52.000 | talking to Google about exactly this issue.
01:18:54.000 | In fact, this was a talk I prepared
01:18:56.000 | partly for that purpose.
01:18:58.000 | We wanted to raise those issues and it's just
01:19:00.000 | really, I don't know what,
01:19:02.000 | rather I can think of some solutions
01:19:04.000 | to the problem
01:19:06.000 | of what you could call brain drain from the academic
01:19:08.000 | point of view or what you could call just
01:19:10.000 | narrowing in into certain local minima in the
01:19:12.000 | industry point of view. But they will require
01:19:14.000 | the leadership of both
01:19:16.000 | academic institutions like MIT and
01:19:18.000 | companies like Google being creative about how they
01:19:20.000 | might work together in ways that are a little bit outside of
01:19:22.000 | their comfort zone. I hope that will start to happen
01:19:24.000 | including at MIT
01:19:26.000 | and at many other universities
01:19:28.000 | and at companies like Google and many others
01:19:30.000 | and I think we need it to happen for the health of
01:19:32.000 | all parties concerned.
01:19:34.000 | - Okay, thank you very much. - Thanks.
01:19:36.000 | - I'm curious about
01:19:38.000 | sort of the premise that you gave
01:19:40.000 | that one of the big gaps
01:19:42.000 | missing at determining intelligence
01:19:44.000 | is the fact that we need to
01:19:46.000 | teach machines how to
01:19:48.000 | recognize models. And I'm
01:19:50.000 | curious as to what you think
01:19:52.000 | sort of non-
01:19:54.000 | goal-oriented
01:19:56.000 | cognitive activity comes into play
01:19:58.000 | there. Things like feelings and emotions
01:20:00.000 | and why
01:20:02.000 | you don't think that
01:20:04.000 | might not necessarily be like
01:20:06.000 | the most
01:20:08.000 | important question.
01:20:10.000 | - The only reason emotions didn't appear
01:20:12.000 | on my slide is because
01:20:14.000 | there's a few reasons, but the slide is only so big.
01:20:16.000 | I wanted the font to be big and readable for
01:20:18.000 | such an important slide. I have
01:20:20.000 | versions of my slide in which I do talk about that.
01:20:22.000 | Okay.
01:20:24.000 | It's not that I think feelings
01:20:26.000 | or emotions aren't important. I think they are important
01:20:28.000 | and I used to not have many insights
01:20:30.000 | about what to do about them,
01:20:32.000 | but actually partly based on some of my colleagues
01:20:34.000 | here at MIT,
01:20:36.000 | BCS, Laura Schultz and Rebecca Sachs,
01:20:38.000 | two of my cognitive colleagues who I
01:20:40.000 | work closely with, they've been
01:20:42.000 | starting to do research
01:20:44.000 | on how people understand emotions, both
01:20:46.000 | their own and others, and we've been starting to work
01:20:48.000 | with them on computational models. So that's actually something
01:20:50.000 | I'm actively interested in and even working on.
01:20:52.000 | But I would say, and again for those of you
01:20:54.000 | who study emotion or know about this, actually you're going to have
01:20:56.000 | Lisa coming in, right?
01:20:58.000 | She's going to basically say a version of the same thing,
01:21:00.000 | I think. The deepest way to understand,
01:21:02.000 | she's one of the world's experts on this,
01:21:04.000 | the deepest way to understand emotion is
01:21:06.000 | very much based on our mental models of
01:21:08.000 | ourselves, of the situation we're in, and of other people.
01:21:10.000 | Think about, for example,
01:21:12.000 | all of the different
01:21:14.000 | I mean, if you think
01:21:16.000 | about, I mean again, Lisa will talk all about this,
01:21:18.000 | but if you think about emotion as just a very
01:21:20.000 | small set of what are sometimes called basic emotions,
01:21:22.000 | like being happy or angry
01:21:24.000 | or sad or
01:21:26.000 | those are a
01:21:28.000 | small number of them, right? There's usually only
01:21:30.000 | a few, right?
01:21:32.000 | You might not say, you might see that
01:21:34.000 | as somehow like very basic things
01:21:36.000 | that are opposed to some kind of cognitive activity.
01:21:38.000 | But think about all the different words
01:21:40.000 | we have for emotion, right?
01:21:42.000 | For example, think about
01:21:44.000 | a famous cognitive emotion like
01:21:46.000 | regret. What does it mean to feel regret
01:21:48.000 | or frustration, right?
01:21:50.000 | To know both for yourself
01:21:52.000 | when you're not just feeling
01:21:54.000 | kind of down or negative, but you're feeling
01:21:56.000 | regret, that means something like
01:21:58.000 | I have to feel like there's a situation
01:22:00.000 | that came out differently from how I
01:22:02.000 | hoped, and I realized I could have
01:22:04.000 | done something differently, right?
01:22:06.000 | So that means you have to be able to understand,
01:22:08.000 | you have to have a model, you have to be able to
01:22:10.000 | do a kind of counterfactual reasoning and to think
01:22:12.000 | oh, if only I had acted a different way, then I
01:22:14.000 | can predict that the world would have come out differently, and that's the
01:22:16.000 | situation I wanted, but instead it came out this other way,
01:22:18.000 | right? Or think about frustration
01:22:20.000 | again, that requires something like
01:22:22.000 | understanding, okay, I've tried a bunch of times, I thought
01:22:24.000 | this would work, but it doesn't seem to be working,
01:22:26.000 | maybe I'm ready to give up.
01:22:28.000 | Those are all, those are very
01:22:30.000 | important human emotions.
01:22:32.000 | We have to understand, to understand ourselves,
01:22:34.000 | we need that, to understand other people, to understand
01:22:36.000 | communication, but those are all filtered
01:22:38.000 | through the kinds of models of action
01:22:40.000 | that I was, just the ones I was talking about
01:22:42.000 | here with these, say, cost-benefit analyses of action.
01:22:44.000 | So what I'm, so I'm just trying to say I think
01:22:46.000 | this is very basic stuff, but
01:22:48.000 | that will be the basis for building
01:22:50.000 | I think better engineering style
01:22:52.000 | models of the full spectrum of human emotion
01:22:54.000 | beyond just like, well, I'm feeling good or bad
01:22:56.000 | or scared, okay? And if, I think
01:22:58.000 | when you see Lisa, she will, in her own way,
01:23:00.000 | say something very similar.
01:23:02.000 | Interesting. Thanks.
01:23:04.000 | Yeah.
01:23:06.000 | Thanks, Josh, for your nice talk. So all is about
01:23:08.000 | human cognition and try to build a model
01:23:10.000 | to mimic those cognition, but you
01:23:12.000 | don't, how much could help you to understand
01:23:14.000 | how the circuit implement those things?
01:23:16.000 | You mean like the circuits in the brain?
01:23:18.000 | Yeah.
01:23:20.000 | Is that what you work on by any chance?
01:23:22.000 | Sorry, what? Is that what you work on by any chance?
01:23:24.000 | Yeah. Yeah, I know. I'm kidding. Yeah.
01:23:26.000 | So in the Center for Brains, Minds, and Machines,
01:23:28.000 | as well as in Brain and Cognitive Science,
01:23:30.000 | yeah, I have a number of colleagues
01:23:32.000 | who study the actual hardware basis
01:23:34.000 | of this stuff in the brain, and that
01:23:36.000 | includes like the large-scale architecture
01:23:38.000 | of the brain, say like what Nancy Kamisher,
01:23:40.000 | Rebecca Sachs study with functional brain imaging,
01:23:42.000 | or the more detailed circuitry, which usually
01:23:44.000 | requires recording from, say, non-human brains,
01:23:46.000 | right, at the level of individual neurons
01:23:48.000 | and connections between neurons. All right.
01:23:50.000 | So I'm very interested in those things,
01:23:52.000 | although it's not mostly what I work on, right?
01:23:54.000 | But I would say, you know, again, like in
01:23:56.000 | many other areas of science, certainly in neuroscience,
01:23:58.000 | the kind of work I'm talking about here
01:24:00.000 | in a sort of classic reductionist program
01:24:02.000 | sets the target for what we might look for.
01:24:04.000 | Like if I just want to go, I mean,
01:24:06.000 | I would, what I would assert, right,
01:24:08.000 | or my working conjunction,
01:24:10.000 | is that I would say, "Okay, so I'm going to
01:24:12.000 | do this, right, or my working conjecture
01:24:14.000 | is that if you do the kind of work
01:24:16.000 | that I'm talking about here,
01:24:18.000 | it gives you the right targets,
01:24:20.000 | or it gives you a candidate set of targets
01:24:22.000 | to look for, what are the neural circuits
01:24:24.000 | computing, right? Whereas if you just
01:24:26.000 | go in and just say, start poking
01:24:28.000 | around in the brain, or have some
01:24:30.000 | idea that what you're going to try to do
01:24:32.000 | is find the neural circuits which underlie
01:24:34.000 | behavior, without a sense of the computations
01:24:36.000 | needed to produce those behaviors,
01:24:38.000 | I think it's going to be very difficult
01:24:40.000 | to know what to look for,
01:24:42.000 | and to know when you've found
01:24:44.000 | even viable answers.
01:24:46.000 | So I think that's the standard kind of
01:24:48.000 | reductionist program,
01:24:50.000 | but it's not,
01:24:52.000 | I also think it's not
01:24:54.000 | one that is divorced from the study
01:24:56.000 | of neural circuits. It's also one,
01:24:58.000 | if you look at the broad picture of reverse
01:25:00.000 | engineering, it's one where
01:25:02.000 | neural circuits and understanding
01:25:04.000 | the circuits in the brain play
01:25:06.000 | an absolutely critical role, okay?
01:25:08.000 | I would say,
01:25:10.000 | when you look at the brain at the hardware level
01:25:12.000 | as an engineer, I'm mostly looking at
01:25:14.000 | the software level, right? But when you look at the hardware level,
01:25:16.000 | there are some remarkable properties.
01:25:18.000 | One remarkable property again is
01:25:20.000 | how much parallelism there is,
01:25:22.000 | and in many ways how fast the computations
01:25:24.000 | are, okay? Neurons are slow,
01:25:26.000 | but the computations of intelligence are very fast.
01:25:28.000 | So how do we get elements that are
01:25:30.000 | in some sense quite slow in their time
01:25:32.000 | constant to produce such intelligent
01:25:34.000 | behavior so quickly? That's a great mystery, and I think
01:25:36.000 | if we understood that, it would have payoff
01:25:38.000 | for building all sorts of
01:25:40.000 | basically application-embedded
01:25:42.000 | circuits, okay? But also
01:25:44.000 | maybe most important is the power consumption,
01:25:46.000 | and again, many people have
01:25:48.000 | noted this, right? If you look at
01:25:50.000 | the power consumption, the power that the brain
01:25:52.000 | consumes, like what did I eat today, okay?
01:25:54.000 | Almost nothing.
01:25:56.000 | My daughter, who's
01:25:58.000 | again, she's doing an internship here, she literally
01:26:00.000 | yesterday, all she ate was a burrito,
01:26:02.000 | and yet she wrote 300 lines of code
01:26:04.000 | for her internship project on
01:26:06.000 | a really cool computational linguistics
01:26:08.000 | project. So somehow she turned a burrito into
01:26:10.000 | a model of child language
01:26:12.000 | acquisition, okay? But how did she
01:26:14.000 | do that, or how do any of us do this, right?
01:26:16.000 | Where if you look at the power that we consume
01:26:18.000 | when we simulate even a very, very small
01:26:20.000 | chunk of cortex on our conventional
01:26:22.000 | hardware, or we do any kind of machine
01:26:24.000 | learning thing, we have systems which are
01:26:26.000 | very, very, very, very far
01:26:28.000 | from the power of the human brain computationally,
01:26:30.000 | but in terms of physical
01:26:32.000 | energy consumed,
01:26:34.000 | way past what any individual
01:26:36.000 | brain is doing. So how do we get circuitry
01:26:38.000 | of any sort, biological or
01:26:40.000 | just any physical circuits, to be as smart
01:26:42.000 | as we are with as little
01:26:44.000 | energy as we are? This is
01:26:46.000 | a huge problem for basically every
01:26:48.000 | area of engineering, right? If you want
01:26:50.000 | to have any kind
01:26:52.000 | of robot, the power consumption is a key
01:26:54.000 | bottleneck. Same for self-driving cars.
01:26:56.000 | If we want to build AI without
01:26:58.000 | contributing to global warming
01:27:00.000 | and climate change, let alone
01:27:02.000 | use AI to solve climate change, we really
01:27:04.000 | need to address these issues, and the brain
01:27:06.000 | is a huge guide there.
01:27:08.000 | I think there are some people who are
01:27:10.000 | really starting to think about this. How can we, say, for example,
01:27:12.000 | build somehow brain-inspired
01:27:14.000 | computers which are
01:27:16.000 | very, very low power, but maybe only approximate?
01:27:18.000 | So I'm thinking here of Joe Bates.
01:27:20.000 | I don't know if any of you know Joe. He's
01:27:22.000 | been around MIT and other places
01:27:24.000 | for quite a while. Can I tell them about
01:27:26.000 | your company? So Joe has
01:27:28.000 | a startup in Kendall Square
01:27:30.000 | called Singular Computing, and they have some
01:27:32.000 | very interesting ideas, including some actual
01:27:34.000 | implemented technology
01:27:36.000 | for low-power, approximate computing
01:27:38.000 | in a sort of a brain-like way that might
01:27:40.000 | lead to possibly even, like, the ability
01:27:42.000 | to build something -- this is Joe's dream --
01:27:44.000 | to build something that's about the size of this table
01:27:46.000 | but that has a billion cores, a billion cores,
01:27:48.000 | and runs on a reasonable
01:27:50.000 | kind of power consumption. I would love to have
01:27:52.000 | such a machine. If anybody
01:27:54.000 | wants to help Joe build it, I think he'd love to talk
01:27:56.000 | to you. But it's
01:27:58.000 | one of a number of ideas. I mean,
01:28:00.000 | Google X, people are working on similar things.
01:28:02.000 | Probably most of the major chip companies
01:28:04.000 | are also inspired by this idea.
01:28:06.000 | And I think, even if you didn't
01:28:08.000 | think you were interested in the brain, if you want to build
01:28:10.000 | the kind of AI we're talking about
01:28:12.000 | and run it on physical hardware of any sort,
01:28:14.000 | and understanding how the brain's circuits
01:28:16.000 | compute what they do,
01:28:18.000 | what I'm talking about, with
01:28:20.000 | as little power as they do,
01:28:22.000 | I don't know any better place to look.
01:28:24.000 | It seems like a lot of
01:28:26.000 | the improvements in AI have been driven
01:28:28.000 | by increasing computational power.
01:28:30.000 | How far would you say --
01:28:32.000 | You mean like GPUs or CPUs?
01:28:34.000 | How far would you say we are
01:28:36.000 | from hardware that could run a general
01:28:38.000 | artificial intelligence?
01:28:40.000 | Of the kind that I'm talking about?
01:28:42.000 | Yeah, I don't know. I'll start with a billion cores
01:28:44.000 | and then we'll see.
01:28:46.000 | I mean, I think we're --
01:28:48.000 | I think there's no way to answer that
01:28:50.000 | question in a way that's software independent.
01:28:52.000 | I don't know how to do that.
01:28:54.000 | But I think that
01:28:56.000 | it's --
01:28:58.000 | I don't know.
01:29:00.000 | When you say how far are we, you mean how far
01:29:02.000 | am I with the resources I have right now?
01:29:04.000 | How far am I if Google decides
01:29:06.000 | to put all of its resources at my disposal
01:29:08.000 | like they might if I were working at DeepMind?
01:29:10.000 | I don't know
01:29:12.000 | the answer to that question.
01:29:14.000 | I think what we can say is
01:29:16.000 | this. Individual neurons --
01:29:18.000 | I mean, again, this goes back to another reason
01:29:20.000 | to study neural circuits.
01:29:22.000 | If you look at what we currently call neural networks
01:29:24.000 | in the AI side, the model of a neuron
01:29:26.000 | is this very, very simple thing.
01:29:28.000 | Individual neurons are not only
01:29:30.000 | much more complex, but have a lot more
01:29:32.000 | computational power. It's not clear how they
01:29:34.000 | use it or whether they use it,
01:29:36.000 | but I think it's just as likely that a neuron
01:29:38.000 | is something like a relu
01:29:40.000 | is that a neuron is something like a computer.
01:29:42.000 | Like, one neuron in your brain is
01:29:44.000 | more like a CPU node,
01:29:46.000 | maybe. And thus,
01:29:48.000 | the 10 billion or trillion --
01:29:50.000 | the large number of neurons
01:29:52.000 | in your brain --
01:29:54.000 | I think it's like 10 billion cortical pyramidal neurons
01:29:56.000 | or something -- might be like
01:29:58.000 | 10 billion cores.
01:30:00.000 | That's at least as plausible, I think, to me
01:30:02.000 | as any other estimate.
01:30:04.000 | I think we're definitely
01:30:06.000 | on the underside with very big error bars.
01:30:08.000 | I completely agree that --
01:30:10.000 | or if this is what you might be suggesting,
01:30:12.000 | and going back to my answer
01:30:14.000 | to your question, I don't think we're going to get to what
01:30:16.000 | I'm talking about or anything like a real brain
01:30:18.000 | scale without major
01:30:20.000 | innovations on the hardware side.
01:30:22.000 | It's interesting that what drove those
01:30:24.000 | innovations that support
01:30:26.000 | current AI was mostly not AI.
01:30:28.000 | It was the video game industry.
01:30:30.000 | When I point to
01:30:32.000 | the video game engine in your head, that's a similar
01:30:34.000 | thing that was driven by the video game industry
01:30:36.000 | on the software side. I think we
01:30:38.000 | should all play as many video games as we can
01:30:40.000 | and contribute to the growth of
01:30:42.000 | the video game industry.
01:30:44.000 | You can see this
01:30:46.000 | in very -- there are companies out there
01:30:48.000 | for example, there's a company called Improbable,
01:30:50.000 | which is a London company,
01:30:52.000 | a London-based startup, a pretty
01:30:54.000 | sizable startup at this point, which is building
01:30:56.000 | something that they call Spatial OS,
01:30:58.000 | which is -- it's not a
01:31:00.000 | hardware idea, but it's a kind of software
01:31:02.000 | idea for very, very big distributed computing
01:31:04.000 | environments to run
01:31:06.000 | much, much more complex, realistic
01:31:08.000 | simulations of the world for much more interesting, immersive,
01:31:10.000 | permanent video games. I think that's
01:31:12.000 | one thing that might -- hopefully that will lead to more
01:31:14.000 | fun, new kinds of games.
01:31:16.000 | But that's one example of where
01:31:18.000 | we might look to that industry to drive
01:31:20.000 | some of the -- just computer systems,
01:31:22.000 | really hardware and software
01:31:24.000 | systems that will
01:31:26.000 | take our game to the next level.
01:31:28.000 | Josh, understanding
01:31:30.000 | on the algorithmic level
01:31:32.000 | or cognitive level is just to
01:31:34.000 | understanding the learning, the meaning of
01:31:36.000 | learning would be how to predict.
01:31:38.000 | But on the circuit level it's different.
01:31:40.000 | But at the what level? On the circuit level.
01:31:42.000 | Well, of course it's
01:31:44.000 | different, right? But already
01:31:46.000 | I think you made a mistake there, honestly.
01:31:48.000 | Like, you said the cognitive level is learning how to
01:31:50.000 | predict, but I'm not sure what you mean by that.
01:31:52.000 | There's many things you could mean, and what our cognitive
01:31:54.000 | science is about is learning which of those versions --
01:31:56.000 | like, I don't think it's learning how to predict. I think
01:31:58.000 | it's learning what you need to know to plan
01:32:00.000 | actions and to -- you know, all those things.
01:32:02.000 | Like, it's not just about predicting.
01:32:04.000 | Because there are things we can imagine that
01:32:06.000 | you would never predict because they would never
01:32:08.000 | happen unless we somehow make the world
01:32:10.000 | different. So generalization, sorry, not
01:32:12.000 | predicting. When your model could generalize.
01:32:14.000 | But especially in the transfer learning that
01:32:16.000 | you are interested in, a few hundred neurons in
01:32:18.000 | prefrontal cortex, they could generalize a lot.
01:32:20.000 | But not
01:32:22.000 | kind of a Bayesian model could do that.
01:32:24.000 | You said,
01:32:26.000 | but a Bayesian model won't do that?
01:32:28.000 | Or they don't do it the way a Bayesian model
01:32:30.000 | does? For sure, because that's in the abstract level.
01:32:32.000 | Well, I mean, how do you
01:32:34.000 | really know?
01:32:36.000 | I mean, what does it mean to say that some neurons
01:32:38.000 | do it? So maybe another way
01:32:40.000 | to put this is to say, look, we have a certain math
01:32:42.000 | that we use to capture these -- you
01:32:44.000 | could call it abstract, or I call it software
01:32:46.000 | level abstractions, right? I mean,
01:32:48.000 | all engineering is based on some kind of abstraction.
01:32:50.000 | But you might have a circuit level abstraction,
01:32:52.000 | a certain kind of hardware level that you're
01:32:54.000 | interested in describing the brain at.
01:32:56.000 | And I'm mostly working at or starting from a more
01:32:58.000 | software level of abstraction. They're all abstractions.
01:33:00.000 | We're not talking about molecules here.
01:33:02.000 | We're talking about some abstract notion
01:33:04.000 | of maybe a circuit, or of a program.
01:33:06.000 | Now it's a really interesting
01:33:08.000 | question. If I look at some circuits,
01:33:10.000 | how do I know what program they're implementing?
01:33:12.000 | If I look at the circuits in this machine,
01:33:14.000 | could I tell what program they're implementing?
01:33:16.000 | Well, maybe, but certainly it would be a lot easier
01:33:18.000 | if I knew something about what programs they might be implementing
01:33:20.000 | before I start to look at the circuitry. If I just
01:33:22.000 | looked at the circuitry without
01:33:24.000 | knowing what a program was, or what programs
01:33:26.000 | the thing might be doing, or what kind of
01:33:28.000 | programming components would be
01:33:30.000 | mappable to circuits in different ways,
01:33:32.000 | I don't even know how I'd begin
01:33:34.000 | to answer that question. So I think
01:33:36.000 | we've made some progress at understanding
01:33:38.000 | what neurons are doing in certain
01:33:40.000 | low-level parts of sensory system
01:33:42.000 | and certain parts of the motor system, like
01:33:44.000 | primary motor cortex. Basically, the parts
01:33:46.000 | of the neurons that are closest to the inputs and outputs
01:33:48.000 | of the brain, where
01:33:50.000 | we don't--you could say
01:33:52.000 | we don't need the kind of software
01:33:54.000 | abstractions that I'm talking about,
01:33:56.000 | or where we sort of agree on
01:33:58.000 | what those things already are, so we can make
01:34:00.000 | enough progress on knowing what to look for
01:34:02.000 | and how to know when we've found it.
01:34:04.000 | But if you want to talk about flexible planning,
01:34:06.000 | things that are more like cognition, that
01:34:08.000 | go on in prefrontal cortex,
01:34:10.000 | at this point, I don't
01:34:12.000 | think that just by recording from those
01:34:14.000 | neurons, we're going to be able to answer those questions
01:34:16.000 | in a meaningful engineering way.
01:34:18.000 | A way that any engineer, software,
01:34:20.000 | hardware, whatever, could really say,
01:34:22.000 | "Yeah, okay, I get it. I get those insights in a way
01:34:24.000 | that I can engineer with." And that's what my goal
01:34:26.000 | is. So that's my goal to do
01:34:28.000 | at the software level, the hardware level, or the entire
01:34:30.000 | systems level, connecting them.
01:34:32.000 | And I think that we can do that
01:34:34.000 | by taking what we're doing and bringing it to contact
01:34:36.000 | with people studying neural circuits. But I don't
01:34:38.000 | think you can leave this level out
01:34:40.000 | and just go straight to the neural circuits. And I think
01:34:42.000 | the more progress we make,
01:34:44.000 | the more we can help people who are studying
01:34:46.000 | at the neural circuit level. And they can help us
01:34:48.000 | address these other engineering questions that we don't
01:34:50.000 | really have access to, like the power issue
01:34:52.000 | or the speed issue.
01:34:54.000 | Thanks. That was great. So maybe
01:34:56.000 | we give Jack a big hand.
01:34:58.000 | [Applause]
01:35:00.000 | [Applause]
01:35:02.000 | [Applause]
01:35:04.000 | [Applause]
01:35:06.000 | [APPLAUSE]