back to indexMIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)
Chapters
0:0
10:14 A long-term research roadmap
17:15 CBMM Architecture of Visual Intelligence
40:2 The Roots of Common Sense
52:35 The intuitive physics engine
55:19 Prediction by simulation
00:00:04.320 |
leading the computational cognitive science group. 00:00:07.040 |
Among many other topics in cognition and intelligence, 00:00:11.000 |
he is fascinated with the question of how human beings learn 00:00:16.280 |
And how these insights can lead to build AI systems 00:00:19.680 |
that are much more efficient learning from data. 00:00:44.000 |
to get to see perspectives on artificial intelligence 00:00:49.000 |
and other entities working on this great quest. 00:00:53.000 |
So I'm going to talk to you about some of the work 00:00:55.000 |
that we do in our group, but also I'm going to try to give 00:00:58.000 |
a broader perspective reflective of a number of MIT faculty, 00:01:07.000 |
Academically, I'm part of Brain and Cognitive Science, 00:01:12.000 |
But I'm also part of the Center for Brains, Minds, and Machines, 00:01:15.000 |
which is an NSF-funded center, science and technology center, 00:01:20.000 |
the science and the engineering of intelligence. 00:01:26.000 |
We also have partners at Harvard and other academic institutions. 00:01:29.000 |
And again, what we stand for, I want to try to convey 00:01:31.000 |
some of the specific things we're doing in the center 00:01:34.000 |
and where we want to go with a vision that really is about 00:01:37.000 |
jointly pursuing the science, the basic science 00:01:40.000 |
of how intelligence arises in the human mind and brain, 00:01:50.000 |
And we deeply believe that these two projects 00:01:56.000 |
Now, it's a really exciting time to be doing anything 00:02:00.000 |
for all the reasons that, you know, brought you all here. 00:02:05.000 |
We have all these ways in which AI is kind of finally here. 00:02:08.000 |
We finally live in the era of something like real practical AI. 00:02:18.000 |
But from my perspective, and I think maybe this reflects, 00:02:21.000 |
you know, why we distinguish what we might call AGI from AI, 00:02:29.000 |
which are systems that do things we used to think 00:02:32.000 |
that only humans could do, and now we have machines 00:02:34.000 |
that do them, often quite well, maybe even better 00:02:45.000 |
None of them have anything like common sense. 00:02:47.000 |
They have nothing like the flexible, general-purpose 00:02:51.000 |
to learn every one of these skills or tasks, right? 00:02:58.000 |
working together often for a number of years, 00:03:00.000 |
often at great cost to somebody who's willing to pay for it. 00:03:14.000 |
because it doesn't even know what a game is, right? 00:03:18.000 |
What is it that makes every one of your brains-- 00:03:23.000 |
but any one of you can get behind the wheel of a car. 00:03:30.000 |
If she lived in California, she'd have a driver's license. 00:03:33.000 |
It's a little bit down the line for us here in Massachusetts. 00:03:36.000 |
But she didn't have to be specially engineered 00:03:45.000 |
by playing just a handful of games, basically. 00:03:56.000 |
I'll talk about the focus for us in our research, 00:04:00.000 |
and a lot of us, again, in CBMM, is summarized here. 00:04:10.000 |
in all these AI technologies, is many, many things, many things. 00:04:13.000 |
But where the progress has been made most recently 00:04:20.000 |
but other kinds of machine-learning technologies 00:04:28.000 |
That means taking data and finding patterns in the data 00:04:41.000 |
and it's reasonable to say that deep learning as a technology 00:04:45.000 |
has really made great strides on pattern recognition 00:04:50.000 |
to solving the problems of pattern recognition. 00:04:57.000 |
In particular, it's about modeling the world. 00:05:00.000 |
And think about all the activities that a human does 00:05:07.000 |
but actually trying to explain and understand 00:05:11.000 |
Or to be able to imagine things that we've never seen, 00:05:16.000 |
from anything we've ever seen, but might want to see, 00:05:27.000 |
some kinds of learning can be thought of as pattern recognition 00:05:32.000 |
or weights in a neural net that are used for those purposes. 00:05:35.000 |
But many activities of learning are about building out new models, 00:05:38.000 |
right, either refining, reusing, improving old models, 00:05:41.000 |
or actually building fundamentally new models 00:05:53.000 |
these are at the heart of human intelligence, 00:05:58.000 |
So I want to talk about the ways we're studying 00:06:08.000 |
Now, I think it's-- I want to be very honest up front 00:06:11.000 |
and to say this is just the beginning of a story, right? 00:06:16.000 |
that itself is a story that goes back decades. 00:06:18.000 |
I'll say a little bit about that history in a minute. 00:06:20.000 |
But where we are now is just looking forward to a future 00:06:23.000 |
when we might be able to capture these abilities, 00:06:25.000 |
you know, at a really mature engineering scale. 00:06:27.000 |
And I would say we are far from being able to capture 00:06:30.000 |
all the ways in which humans richly, flexibly, quickly 00:06:33.000 |
build models of the world at the kind of scale 00:06:37.000 |
either big tech companies like Google or Microsoft 00:06:44.000 |
And I think what I want to talk to you about here 00:06:53.000 |
how intelligence works in the human mind and brain, 00:06:59.000 |
When we say reverse engineering, we're talking about science, 00:07:06.000 |
that if we approach cognitive science and neuroscience 00:07:08.000 |
like an engineer, where, say, the output of our science 00:07:10.000 |
isn't just a description of the brain or the mind in words, 00:07:13.000 |
but in the same terms that an engineer would use 00:07:17.000 |
then that will be both the basis for a much more rigorous 00:07:21.000 |
but also direct translation of those insights 00:07:25.000 |
Now, I said before I talk a little about history, 00:07:30.000 |
Again, if part of what brought you here is deep learning, 00:07:33.000 |
and I know even if you've never heard of deep learning before, 00:07:45.000 |
to look back on the history of where did techniques 00:07:47.000 |
for deep learning come from, or reinforcement learning. 00:07:50.000 |
Those are the two tools in the current machine learning arsenal 00:07:55.000 |
things like back propagation or end-to-end stochastic gradient descent 00:07:58.000 |
or temporal difference learning or Q-learning. 00:08:02.000 |
Maybe some of you have read these original papers. 00:08:04.000 |
Here's the original paper by Rumelhart, Hinton, and colleagues 00:08:07.000 |
in which they introduced the back propagation algorithm 00:08:13.000 |
Here's the original perceptron paper by Rosenblatt, 00:08:15.000 |
which introduced the one-layer version of that architecture 00:08:20.000 |
Here's the first paper on the temporal difference learning method 00:08:24.000 |
for reinforcement learning from Sutton and Bartow. 00:08:31.000 |
which, again, those of you who don't know that architecture, 00:08:34.000 |
think of a kind of probabilistic, undirected, multilayer perceptron. 00:08:42.000 |
if you know about current recurrent neural network architecture, 00:08:45.000 |
earlier, much simpler versions of the same idea 00:08:47.000 |
were proposed by Jeff Ellman and his simple recurrent networks. 00:08:50.000 |
The reason I want to put up the original papers here 00:08:52.000 |
is for you to look at both when they were published 00:08:57.000 |
So if you look at the dates, you'll see papers going back 00:08:59.000 |
to the '80s, but even the '60s, or even the 1950s. 00:09:05.000 |
Most of them were published in psychology journals. 00:09:08.000 |
So the journal Psychological Review, if you don't know it, 00:09:10.000 |
is like the leading journal of theoretical psychology 00:09:14.000 |
Or Cognitive Science, the journal of the Cognitive Science Society. 00:09:17.000 |
Or the backprop paper was published in Nature, 00:09:24.000 |
the Institute for Cognitive Science in San Diego. 00:09:26.000 |
So what you see here is already a long history 00:09:31.000 |
These are people who are in psychology or cognitive science departments 00:09:34.000 |
and publishing in those places, but by formalizing 00:09:37.000 |
even very basic insights about how humans might learn, 00:09:40.000 |
or how brains might learn, in the right kind of math, 00:09:44.000 |
that led to, of course, progress on the science side, 00:09:47.000 |
but it led to all the engineering that we see now. 00:09:51.000 |
We needed, of course, lots of innovations and advances 00:09:54.000 |
in computing hardware and software systems, right? 00:10:00.000 |
and it came from doing science like an engineer. 00:10:03.000 |
So what I want to talk about in our vision is, 00:10:09.000 |
what would we be looking back on now, or over this time scale? 00:10:12.000 |
Well, here's a long-term research roadmap that reflects 00:10:15.000 |
some of my ambitions and some of our center's goals, 00:10:20.000 |
We'd like to be able to address basic questions, 00:10:22.000 |
questions of what it is to be and to think like a human, 00:10:27.000 |
or meaning and language, or real learning, right? 00:10:38.000 |
And for each of these, there are basic scientific questions. 00:10:40.000 |
How do we become aware of the world and ourselves in it? 00:10:43.000 |
It starts with perception, but it really turns into awareness, 00:10:52.000 |
What really is a meaning, and how does a child grasp it? 00:10:58.000 |
Are they blank slates, or do they start with some kind of 00:11:03.000 |
These are just some of the questions that we're interested 00:11:08.000 |
how do you learn all the things you didn't directly experience, 00:11:11.000 |
right, but that somehow you got from the accumulation of 00:11:19.000 |
How do you think of the new questions themselves? 00:11:23.000 |
These are all key activities of human intelligence. 00:11:27.000 |
where our models come from, what we do with our models, 00:11:31.000 |
And if we could get machines that could do these things, 00:11:33.000 |
well, again, on the bottom row, think of all the actual 00:11:36.000 |
Now, in our center, in both my own activities and a lot of what 00:11:40.000 |
my group does these days, and what a number of other 00:11:42.000 |
colleagues in the Center for Brains, Minds, and Machines do, 00:11:45.000 |
as well as, you know, very broadly people in BCS and CSAIL, 00:11:48.000 |
one place where we work on the beginnings of these problems, 00:11:52.000 |
like think 50 years, okay, maybe shorter, maybe longer, 00:11:55.000 |
I don't know, but think well beyond 10 years. 00:12:01.000 |
a lot of our focus is around visual intelligence, 00:12:05.000 |
Again, we can build on the successes of deep networks, 00:12:07.000 |
and a lot of pattern recognition and machine vision. 00:12:10.000 |
It's a good way to put these ideas into practice. 00:12:14.000 |
the visual system in the brain, in the human and other 00:12:19.000 |
very clearly the best understood part of the brain, 00:12:21.000 |
and at a circuit level, it's the part of the brain 00:12:28.000 |
But even there, there's things which we still don't 00:12:32.000 |
So here's an example of a basic problem in visual intelligence 00:12:35.000 |
that we and others in the Center are trying to solve. 00:12:39.000 |
Look around you, and you feel like there's a whole world 00:12:43.000 |
around you, and there is a whole world around you, 00:12:47.000 |
But what the actual sense data that's coming in through 00:12:50.000 |
your eyes looks more like this photograph here, 00:12:54.000 |
but it's mostly blurry except for a small region 00:12:58.000 |
So that corresponds biologically to what part of the image 00:13:02.000 |
That's the central region of cells in the retina, 00:13:04.000 |
where you have really high resolution visual data. 00:13:07.000 |
The size of your fovea is roughly like if you hold out 00:13:09.000 |
your thumb at arm's length, it's a little bit bigger 00:13:13.000 |
Most of the image, in terms of the actual information 00:13:16.000 |
coming in in a bottom-up sense to your brain, 00:13:22.000 |
and then by saccading around or making a few eye movements, 00:13:25.000 |
you get a few glimpses, each not much bigger than the size 00:13:31.000 |
into what feels like and really is a rich representation 00:13:36.000 |
And when I say around you, I mean literally around you. 00:13:41.000 |
Without turning around, nobody's allowed to turn around, 00:13:46.000 |
Now the answer's going to be different for different people, 00:13:52.000 |
I think there's a person pretty close behind me. 00:14:00.000 |
For people in the very back row, you know there isn't 00:14:02.000 |
a person behind you, and you're conscious of being 00:14:05.000 |
You might be conscious that there's a wall right behind you. 00:14:10.000 |
not in the very back, think about how far behind you 00:14:13.000 |
is the back, like where's the nearest wall behind you? 00:14:18.000 |
So I don't know, I'm pointing to someone there. 00:14:20.000 |
Can you see, say something if you think I'm pointing at you. 00:14:25.000 |
but I'm pointing to someone behind you, okay. 00:14:27.000 |
I'll point to you, yeah, I'm pointing to you. 00:14:31.000 |
No, you can't turn around, you've blown your chance. 00:14:36.000 |
okay, do you see I'm pointing to you there with the tie? 00:14:54.000 |
Now, how about you, how far is the nearest wall behind you? 00:15:09.000 |
in the metric system, I barely know, but yeah, I mean, 00:15:13.000 |
you're, each of you is surely not exactly right, 00:15:17.000 |
but you're certainly within an order of magnitude, 00:15:21.000 |
you know, you're probably, my guess is you're probably right 00:15:29.000 |
I mean, even if it's not, what did you say, 20 meters? 00:15:31.000 |
Even if it's not 20 meters, it's probably closer to 20 meters 00:15:36.000 |
and then it is to 50 meters, so how do you know this? 00:15:54.000 |
and it's certainly not 10 or 20 or 50, right? 00:16:01.000 |
Okay, so again, think about how instantly, effortlessly, 00:16:12.000 |
Certainly, when we're talking about what's behind you in space, 00:16:17.000 |
but when it comes to reaching for things right in front of you, 00:16:20.000 |
very precise shape and physical property estimates 00:16:28.000 |
but something about what's in their head, right? 00:16:30.000 |
You track whether someone's paying attention to you 00:16:32.000 |
when you're talking to them, what they might want from you, 00:16:36.000 |
what they might be thinking about other people, okay? 00:16:42.000 |
and you can start to see how it turns into basic questions, 00:16:45.000 |
I think, of what we might call the beginnings of consciousness, 00:16:49.000 |
or at least our awareness of ourself in the world, 00:16:55.000 |
but also other aspects of higher-level intelligence 00:16:57.000 |
and cognition that are not just about perception, 00:16:59.000 |
like symbols, right, to describe, even to ourselves, 00:17:02.000 |
what's around us and where we are and what we can do with it. 00:17:05.000 |
You have to go beyond just what we would normally call 00:17:08.000 |
the stuff of perception to, say, the thoughts in somebody's head 00:17:19.000 |
and I'm not going to go into any of the details of how this works, 00:17:22.000 |
and this is just notional, this is just a picture, 00:17:24.000 |
it's like just a sketch from a grant proposal 00:17:28.000 |
but it's based on a lot of scientific understanding 00:17:32.000 |
There are different parts of the brain that correspond 00:17:34.000 |
to these different modules in our architecture, 00:17:36.000 |
as well as some kind of emerging engineering way 00:17:40.000 |
and maybe even hardware levels how these modules might work. 00:17:47.000 |
which is like bottom-up visual or other perceptual input. 00:17:50.000 |
That's the kind of thing that is pretty close 00:17:58.000 |
the output of that isn't just pattern class labels, 00:18:04.000 |
So again, an understanding of space and objects, 00:18:14.000 |
this is what we call the brain OS in this picture. 00:18:29.000 |
into the really core cognitive representations 00:18:33.000 |
And then if we're going to start to talk about it in language 00:18:36.000 |
or to build plans on top of what we have seen and understood, 00:18:40.000 |
that's where we talk about symbols coming into the picture, 00:18:44.000 |
the building blocks of language and plans and so on. 00:18:48.000 |
So now we might say, well, okay, this is an architecture 00:18:52.000 |
that is brain-inspired and cognitively inspired, 00:18:55.000 |
and we're planning to turn into real engineering. 00:19:03.000 |
Maybe the engineering toolkit that's currently 00:19:05.000 |
been making a lot of progress in, let's say, industry, 00:19:09.000 |
Maybe let's take deep learning, but to stand for a broader set 00:19:16.000 |
and say, okay, well, maybe that can scale up to this. 00:19:22.000 |
I'm happy in the question period if people want to debate this. 00:19:29.000 |
I don't mean, like, it can't happen or it won't happen. 00:19:38.000 |
reverse engineering approach, and that at least 00:19:42.000 |
that industry incentives especially optimize for, 00:19:45.000 |
it's not even really trying to take us to these things. 00:19:52.000 |
as pattern recognition very much of a success. 00:19:59.000 |
or even play around with in certain publicly available 00:20:02.000 |
data sets, feels like we've made great progress. 00:20:04.000 |
And this is an aspect of visual intelligence, 00:20:12.000 |
You know, basically there's been a bunch of systems. 00:20:25.000 |
A couple of years ago, I think, there were something 00:20:29.000 |
around the same time from basically all the major 00:20:31.000 |
industry computer vision groups as well as a couple 00:20:38.000 |
by some Microsoft researchers and other collaborators, 00:20:41.000 |
trained a combination of deep convolutional neural networks, 00:20:47.000 |
with recurrent neural networks, which had recently 00:20:51.000 |
statistical language modeling, glued them together 00:20:54.000 |
and produced a system which made very impressive results 00:20:57.000 |
in a big training set and a held-out test set 00:21:00.000 |
where the goal was to take an image and write 00:21:09.000 |
And these systems surpassed human-level accuracy 00:21:12.000 |
on the held-out test set from a big training set. 00:21:15.000 |
But what you can see when you really dig into these things 00:21:23.000 |
but it's overfitting to whatever are the particular 00:21:27.000 |
wherever it came from, certain set of photographs 00:21:32.000 |
which even a big data set, it's not about quantity. 00:21:39.000 |
So one way to test this system is to apply it 00:21:42.000 |
to what seems like basically the same problem, 00:21:45.000 |
but not within a certain curated or built data set. 00:21:49.000 |
And there's a convenient Twitter bot that lets you do this. 00:21:56.000 |
industry AI captioning systems, a very good one. 00:22:00.000 |
to critique these systems for what they're trying to do. 00:22:02.000 |
I'm just trying to point out what they don't really 00:22:08.000 |
and just every couple of hours takes a random image 00:22:10.000 |
from the web, captions it, and uploads the results to Twitter. 00:22:15.000 |
a first version of this talk, I just took a few days 00:22:20.000 |
I didn't take every single image, but I took, you know, 00:22:26.000 |
and the kinds of failures that such a system will make. 00:22:28.000 |
So we can go through this, and it's a little bit 00:22:35.000 |
of a few days in the life of one of these CaptionBots. 00:22:39.000 |
So here we have a picture of a person holding-- 00:22:44.000 |
and I can't read up there, so maybe you'll have to tell me 00:22:46.000 |
what it says--but a person holding a cell phone. 00:22:52.000 |
Well, it's not a person holding a cell phone, 00:23:00.000 |
but it's some kind of musical instrument, right? 00:23:06.000 |
I would call that an A result, maybe even A+. 00:23:10.000 |
Here's a group of people standing on top of a mountain. 00:23:19.000 |
because in the data set they were trained on, 00:23:21.000 |
there's a lot of people, and people often talk about people. 00:23:28.000 |
there you did some of my cognitive activities 00:23:36.000 |
A building with a cake, a large stone building with a clock tower. 00:23:39.000 |
I think that's pretty good. I'd give that like a B+. 00:23:47.000 |
Here's a truck parked on the side of a building. 00:23:53.000 |
but it's not a truck, and it doesn't seem like the main thing in the image. 00:24:02.000 |
This is pretty good. I'd give this like an A- or B+, 00:24:04.000 |
because there is a ship in the water, but it's not very large. 00:24:07.000 |
It's really more of like a tugboat or something. 00:24:13.000 |
No, but in another sense, it's really missing 00:24:15.000 |
what's actually interesting and important and meaningful to humans. 00:24:28.000 |
I don't know what kind of weird way of saying it. 00:24:34.000 |
A group of people that are standing in the grass near a bridge. 00:24:36.000 |
Again, there's two people, and there's some grass, and there's a bridge, 00:24:45.000 |
There's a boat, there's a group of people, they're standing, 00:24:48.000 |
but again, the sentence that you see is more based on a bias 00:24:51.000 |
of what people have said in the past about images that are only vaguely like this. 00:24:59.000 |
A large clock mounted to the side of a building. 00:25:05.000 |
A building with snow on the ground. A little bit less good. 00:25:09.000 |
Some people who--I don't know them, but I bet that's probably right, 00:25:12.000 |
because identifying faces and recognizing people who are famous 00:25:17.000 |
probably I would trust current pattern recognition systems to get that. 00:25:25.000 |
I think there's a guy in there, but we didn't get him. 00:25:29.000 |
Again, there is sort of a person, and there's some puddles, 00:25:43.000 |
A plate with a fork and knife. A clear blue sky. 00:25:47.000 |
Again, if you actually go and play with this system, 00:25:50.000 |
partly because I think--my friends at Microsoft told me they've improved it some. 00:25:56.000 |
I chose what also would be the funnier example. 00:26:01.000 |
I'm not trying to take away what are impressive AI technologies, 00:26:05.000 |
but I think it's clear that there's a sense of understanding 00:26:08.000 |
in any one of these images that it's important to see 00:26:13.000 |
if it can make the kind of errors that it makes, 00:26:19.000 |
and it's probably not even trying to scale towards the dimensions 00:26:22.000 |
of intelligence that we think about when we're talking about human intelligence. 00:26:26.000 |
Another way to put this--I'm going to show you a really insightful blog post 00:26:31.000 |
In a couple of days, I'm not sure, you're going to have Andrej Karpathy, 00:26:34.000 |
who's one of the leading people in deep learning. 00:26:38.000 |
This is a really great blog post he wrote a couple of years ago 00:26:45.000 |
He worked at Google a little bit on some early big neural net AI projects there. 00:26:51.000 |
He was at OpenAI. He was one of the founders of OpenAI. 00:26:54.000 |
Recently, he joined Tesla as their director of AI research. 00:26:58.000 |
About five years ago, he was looking at the state of computer vision 00:27:02.000 |
from a human intelligence point of view and lamenting how far away we were. 00:27:06.000 |
This is the title of his blog post, "The State of Computer Vision and AI." 00:27:12.000 |
He took this image, which was a famous image in its own right. 00:27:17.000 |
It was a popular image of Obama back when he was president 00:27:20.000 |
playing around as he liked to do when he was on tour. 00:27:22.000 |
If you take a look at this, you can see you probably all can recognize 00:27:28.000 |
but you can also get the sense of where he is and what's going on. 00:27:31.000 |
You might see people smiling, and you might get the sense 00:27:33.000 |
that he's playing a joke on someone. Can you see that? 00:27:36.000 |
How do you know that he's playing a joke and what that joke is? 00:27:40.000 |
As Andrej goes on to talk about in his blog post, 00:27:43.000 |
if you think about all the things that you have to really deploy in your mind 00:27:49.000 |
Of course, it starts with seeing people and objects 00:27:55.000 |
notice his foot on the scale and understand enough about how scales work 00:27:59.000 |
that when a foot presses down, it exerts force, 00:28:03.000 |
It doesn't just magically measure people's weight, 00:28:07.000 |
You have to see who can see that he's doing that 00:28:13.000 |
and why some people can see that he's doing that 00:28:15.000 |
and can see that some other people can't see it, 00:28:19.000 |
Someday we should have machines that can understand this, 00:28:23.000 |
but hopefully you can see why the kind of architecture 00:28:28.000 |
that I'm talking about would be the building blocks 00:28:31.000 |
or the ingredients to be able to get them to do that. 00:28:34.000 |
Again, I prepared a version of this talk a few months ago, 00:28:37.000 |
and I wrote to Andrej and I said I was going to use this, 00:28:40.000 |
and I was curious if he had any reflections on this 00:28:44.000 |
and where he thought we were relative to five years ago, 00:28:47.000 |
because certainly a lot of progress has been made. 00:28:55.000 |
and that's one of the many reasons why he's such an important person right now in AI. 00:28:58.000 |
He's both very technically strong and honest about what we can do, what we can't do, 00:29:04.000 |
"It's nice to hear from you. It's fun you should bring this up. 00:29:06.000 |
I was also thinking about writing a return to this." 00:29:09.000 |
And in short, basically, I don't believe we've made very much progress. 00:29:12.000 |
He points out that in his long list of things that you'd need to understand the image, 00:29:16.000 |
we have made progress on some--the ability to, again, detect people 00:29:19.000 |
and do face recognition for well-known individuals. 00:29:24.000 |
And he wasn't particularly optimistic that the current route that's being pursued in industry 00:29:28.000 |
is anywhere close to solving or even really trying to solve these larger questions. 00:29:38.000 |
what we see is, again, represents the same point. 00:29:42.000 |
It says, "I think it's a group of people standing next to a man in a suit and tie." 00:29:49.000 |
It just doesn't go far enough, and the current ideas of build a data set, 00:29:54.000 |
train a deep learning algorithm on it, and then repeat 00:29:58.000 |
aren't really even, I would venture, trying to get to what we're talking about. 00:30:03.000 |
Or here's another--I'll just give you one other example of a couple of photographs 00:30:06.000 |
from my recent vacation in a nice, warm, tropical locale, 00:30:11.000 |
which I think illustrate ways in which, again, the gap where we have machines 00:30:23.000 |
Well, of course, we don't even need reinforcement learning or deep learning 00:30:26.000 |
to build a machine that can win or tie, do optimally in tic-tac-toe. 00:30:32.000 |
This is a real tic-tac-toe game, which I saw on the grass outside my hotel. 00:30:37.000 |
What do you have to do to look at this and recognize that it's a tic-tac-toe game? 00:30:41.000 |
You have to see what's--in some sense, there's a 3-by-3 grid, 00:30:46.000 |
It's only delimited by these ropes or strings. 00:30:51.000 |
It's not actually a grid in any simple geometric sense. 00:30:55.000 |
But yet a child can look at that--and indeed, here's an actual child 00:30:57.000 |
who was looking at it--and recognize, "Oh, it's a game of tic-tac-toe," 00:31:00.000 |
and even know what they need to do to win, namely put the X and complete it, 00:31:07.000 |
You show this sort of thing, though, to one of these image-understanding 00:31:10.000 |
caption bots, and I think it's a close-up of a sign. 00:31:14.000 |
Again, saying that this is a close-up of a sign is not the same thing, 00:31:22.000 |
I would venture, as a cognitive or computational activity 00:31:25.000 |
that's going to give us what we need to, say, recognize the object, 00:31:28.000 |
to recognize it as a game, to understand the goal, 00:31:32.000 |
Whereas this kind of architecture is designed to try to do 00:31:37.000 |
I bring in these examples of games or jokes to really show where perception 00:31:43.000 |
goes to cognition, all the way up to symbols. 00:31:47.000 |
So to get objects and forces and mental states, that's the cognitive core, 00:31:52.000 |
but to be able to get goals and plans and what do I do 00:31:59.000 |
Here's another way into this, and it's one that also motivates, I think, 00:32:02.000 |
a lot of really good work on the engineering side, 00:32:04.000 |
and a lot of our interest in the science side, is think about robotics 00:32:11.000 |
what does the brain have to be like to control the body? 00:32:14.000 |
So again, you're going to hear from, shortly, I think maybe it's next week, 00:32:18.000 |
from Mark Raybert, who's one of the founders of Boston Dynamics, 00:32:22.000 |
which is one of my favorite companies anywhere. 00:32:25.000 |
They're without doubt the leading maker of humanoid robots, 00:32:32.000 |
They have all sorts of other really cool robots, robots like dogs, 00:32:36.000 |
robots that have -- I think you'll even get to see a live demonstration 00:32:40.000 |
of one of these robots. It's really awesome, impressive stuff. 00:32:44.000 |
But what about the minds and brains of these robots? 00:32:47.000 |
Well, again, if you ask Mark, ask them how much of human-like cognition 00:32:51.000 |
do they have in their robots, and I think he would say very little. 00:32:54.000 |
In fact, we have asked him that, and he would say very little. 00:32:59.000 |
He's actually one of the advisors of our center, and I think in many ways 00:33:02.000 |
we're very much on the same page. We both want to know, 00:33:05.000 |
how do you build the kind of intelligence that can control these bodies 00:33:11.000 |
Here's another example of an industry robotics effort. 00:33:13.000 |
This is Google's Arm Farm, where they've got lots of robot arms, 00:33:16.000 |
and they're trying to train them to pick up objects using various kinds 00:33:19.000 |
of deep learning and reinforcement learning techniques. 00:33:22.000 |
I think it's one approach. I just think it's very, very different 00:33:25.000 |
from the way humans learn to, say, control their body and manipulate objects. 00:33:29.000 |
You can see that in terms of things that go back to what you were saying 00:33:32.000 |
when you were introducing me. Think about how quickly we learn things. 00:33:35.000 |
Here you have the Arm Farmers trying to generate, effectively, 00:33:39.000 |
maybe if not infinite, but hundreds of thousands, millions of examples 00:33:43.000 |
of reaches and pickups of objects, even with just a single gripper. 00:33:47.000 |
Yet a child, who in some ways can't control their body 00:33:50.000 |
nearly as well as robots can be controlled at the low level, 00:33:56.000 |
I'll show you two of my favorite videos from YouTube here, 00:33:59.000 |
which motivate some of the research that we're doing. 00:34:01.000 |
The one on the left is a one-and-a-half-year-old, 00:34:05.000 |
Just watch this one-and-a-half-year-old here doing a popular activity 00:34:20.000 |
Okay, so he's doing this stacking cup activity. 00:34:27.000 |
He's got a stack of three, and what you can see 00:34:29.000 |
from the first part of this video is it looks like he's trying 00:34:32.000 |
to make a second stack that he's trying to pick up at once. 00:34:35.000 |
Basically, he's trying to make a stack of two 00:34:40.000 |
And he's trying to debug his plan, because it got a little bit stuck here. 00:34:44.000 |
And think about it. I mean, again, if you know anything 00:34:48.000 |
about robots manipulating objects, even just what he just did, 00:34:51.000 |
no robot can decide to do that and actually do it. 00:34:54.000 |
At some point, he's almost got it. It's a little bit tricky, 00:34:57.000 |
but at some point he's going to get that stack of two. 00:35:00.000 |
He realizes he has to move that object out of the way. 00:35:02.000 |
Look at what he just did. Move it out of the way, 00:35:06.000 |
And now he's got a stack of two on a stack of three, 00:35:12.000 |
He's got a stack of 10, because he knows he accomplished 00:35:14.000 |
a key waypoint along the way to his final goal. 00:35:17.000 |
That's a kind of early symbolic cognition, right? 00:35:19.000 |
To understand that I'm trying to build a tall tower, 00:35:24.000 |
And you can take a tower and put it on top of another tower, 00:35:27.000 |
or stack a stack on a stack, and you have a bigger stack. 00:35:30.000 |
So think about how he goes from bottom-up perception 00:35:33.000 |
to the objects, to the physics needed to manipulate the objects, 00:35:36.000 |
to the ability to make even those early kinds of symbolic plans. 00:35:49.000 |
and he gives himself another big hand, but falls over. 00:35:59.000 |
But all the other stuff to get to that point, 00:36:01.000 |
we don't really know how to do in a robotic setting. 00:36:03.000 |
Or think about this baby here. This is a younger baby. 00:36:06.000 |
This is one of the Internet's very most popular videos 00:36:23.000 |
is to stack up cups on the back of a cat, I guess. 00:36:25.000 |
He's asking, "How many cups can I fit on the back of a cat?" 00:36:33.000 |
OK, well, he can't fit more than three, it turns out. 00:36:43.000 |
Now watch that part when he reaches back behind him there. 00:36:45.000 |
That's--I'll just pause it there for a moment. 00:36:49.000 |
that's a particularly striking moment in the video. 00:36:53.000 |
of what we call in cognitive science object permanence. 00:36:59.000 |
as these permanent, enduring entities in the world, 00:37:03.000 |
In this case, he hadn't seen or touched that object behind him 00:37:12.000 |
and he was able to incorporate it in his plan. 00:37:14.000 |
There's a moment before that when he's about to reach for it, 00:37:18.000 |
And it's only when he's now exhausted all the other objects here 00:37:20.000 |
that he can see, he's like, "OK, now time to get this object 00:37:24.000 |
So think about what has to be going on in his brain 00:37:28.000 |
That's like the analog of you understanding what's behind you. 00:37:31.000 |
It's not that these things are impossible to capture machines. 00:37:35.000 |
It's just that training a deep neural network 00:37:41.000 |
But we think by reverse engineering how it works in the brain, 00:37:47.000 |
It's not just humans that do this kind of activity. 00:37:49.000 |
Here's a couple of, again, rather famous videos. 00:37:53.000 |
Crows are famous object manipulators and tool users, 00:37:55.000 |
but also orangutans, other primates, rodents. 00:37:59.000 |
We can watch--here, let me pause this one for a second. 00:38:02.000 |
If we watch this orangutan here, he's got a bunch of big Legos, 00:38:27.000 |
Some people think the video was actually filmed backwards, 00:38:31.000 |
and the orangutan just slowly disassembled it piece by piece. 00:38:33.000 |
And it turns out it's remarkably hard to tell 00:38:35.000 |
whether it's played forward or backwards in time, 00:38:41.000 |
if an orangutan actually was able to build up 00:38:45.000 |
But I would submit that it would be almost as impressive 00:38:53.000 |
the easiest thing to do would just be to knock it over. 00:39:09.000 |
And what you'll see here over the course of this video 00:39:15.000 |
that they're hoping to bring back to their nest. 00:39:19.000 |
And at some point after just trying to get it over 00:39:23.000 |
And at some point after just trying to get it 00:39:37.000 |
at some point he decides, "Okay, I'm just going to come back 00:39:41.000 |
And he tries one more time, and this time valiantly gets it over. 00:39:47.000 |
You can clap for me at the end, or clap for whoever later. 00:39:53.000 |
But again, think what had to be going on in his brain 00:40:05.000 |
These other ones are, some of them actually were 00:40:09.000 |
But this is one that motivates a lot of the science 00:40:21.000 |
The kids in this experiment were the same age as 00:40:23.000 |
the first baby I showed you, the one who did the stacking. 00:40:29.000 |
interested in intelligence, for reasons we can talk 00:40:47.000 |
do things that are kind of like what this human did, 00:41:03.000 |
basically no chimp did what you're going to see 00:41:09.000 |
and I'll turn on the sound here if you can hear it, 00:41:23.000 |
stops and then the kid just does whatever they want 00:41:39.000 |
time and think about what's got to be going inside 00:41:45.000 |
what it looks like to us is the kid figured out that this 00:41:47.000 |
guy needed help and helped him. And the paper 00:42:03.000 |
details from what you might have seen before. 00:42:05.000 |
And there's other ones in there that are really truly novel because 00:43:09.000 |
he makes a prediction about what the guy's going to do, 00:43:19.000 |
If I did the right thing to help you, then I expect 00:43:23.000 |
So you can see these things happening, and we 00:43:25.000 |
want to know what's going on inside the mind that guides 00:43:31.000 |
that we're working on over the next few years, 00:43:43.000 |
a robot that could do what this kid and many other 00:43:45.000 |
kids in these experiments do, to say, "Help you 00:43:47.000 |
out around the house without having to be programmed 00:43:51.000 |
get a sense. Oh yeah, you need a hand with that? 00:43:59.000 |
they'll try to help and really do the opposite. 00:44:09.000 |
reliable engineering technology. That would be 00:44:13.000 |
related to, say, machines that you could actually 00:44:19.000 |
So how are we going to do this? Well, let me spend 00:44:21.000 |
the rest of the time talking about how we try 00:44:49.000 |
cognition. So when I say we have an intuitive 00:44:51.000 |
understanding of physical objects and people's goals, 00:44:57.000 |
Probabilistic programs, a little bit more technically, 00:45:03.000 |
networks or other kinds of directed graphical 00:45:17.000 |
expressive toolkit of knowledge representation. 00:45:21.000 |
of algorithmic tools for representing knowledge. 00:45:25.000 |
to the ability to do probabilistic inference, 00:45:31.000 |
directed graphical model. So for those of you 00:45:33.000 |
who know about graphical models, that might make some 00:45:35.000 |
sense to you. But just more broadly, what this is, 00:45:43.000 |
deep learning era, but over... if you look back over 00:45:51.000 |
more, but definitely like three ideas we can really 00:45:59.000 |
ideas when the mainstream of the field thought 00:46:01.000 |
this was totally the way to go and every other idea 00:46:29.000 |
inspired architectures for pattern recognition. 00:46:37.000 |
neural networks, has some distinctive strengths 00:46:47.000 |
as an outstanding challenge for neural networks, 00:46:51.000 |
to take knowledge across a number of previous 00:46:53.000 |
tasks to transfer to others. This is a real challenge 00:46:55.000 |
and has always been a challenge in a neural net. 00:47:01.000 |
in, for example, a hierarchical Bayesian model. 00:47:03.000 |
And if you look at some of the recent attempts, 00:47:07.000 |
deep learning world to try to get kinds of transfer 00:47:09.000 |
learning and learning to learn, they're really cool. 00:47:35.000 |
in computer systems and programming languages. 00:47:45.000 |
knowledge representations which are as expressive 00:47:47.000 |
as anything that anybody ever did in the symbolic 00:47:53.000 |
data as anything in the probabilistic paradigm, 00:48:07.000 |
There's a number of actually implemented tools. 00:48:11.000 |
a number of probabilistic programming languages 00:48:17.000 |
in our group a few years ago, almost 10 years ago 00:48:23.000 |
functional programming core. So Church is a probabilistic 00:48:43.000 |
in here, which is a project of Vakash Mansinghka's 00:49:05.000 |
necessary for making the kind of architecture that 00:49:13.000 |
think, again, many people are using some version 00:49:15.000 |
of this idea, but maybe a little bit different 00:49:21.000 |
that I like to talk about is what I call the game 00:49:27.000 |
programs are about. When I talk about probabilistic 00:49:31.000 |
kind of programs we're using. We're just basically -- 00:49:33.000 |
these probabilistic programming languages at their best 00:49:49.000 |
inferences -- conditional inferences -- are computable, 00:50:03.000 |
thinking about the kinds of programs that are in modern 00:50:07.000 |
So again, probably most of you are familiar with these, 00:50:09.000 |
but if you're -- and increasingly they're playing 00:50:11.000 |
a role in all sorts of ways in AI. But these are 00:50:13.000 |
tools that were developed by the video game industry 00:50:21.000 |
in some sense, most of the hard technical work 00:50:37.000 |
but to have them be able to interact with the world in real 00:50:43.000 |
in an interactive way as the player moves around 00:50:47.000 |
populate the world with non-player characters that 00:50:49.000 |
will behave in an even vaguely intelligent way. 00:50:53.000 |
tools for doing all of this without having to write 00:51:25.000 |
player is going to attack this base. So back in 00:51:31.000 |
things that would fire missiles kind of randomly 00:51:35.000 |
But let's say you want a guard to be a little intelligent, 00:51:39.000 |
"Oh, and I see the player," and then to actually start 00:51:41.000 |
shooting at you and to even maybe pursue you. 00:51:51.000 |
and some of you might think this is crazy, and some 00:51:53.000 |
of you might think this is a very natural idea 00:52:09.000 |
representations that evolution has built into 00:52:25.000 |
inside a framework for probabilistic inference 00:52:27.000 |
-- that's what we mean by probabilistic programs -- 00:52:39.000 |
work that we did in our groups, that Pete Battaglia 00:52:49.000 |
and this is also an illustration of a kind of 00:52:51.000 |
experiment that you might do. When I keep talking 00:52:53.000 |
about science, like I'll show you now a couple of experiments. 00:52:59.000 |
world scenes, and ask them to make a number of judgments. 00:53:03.000 |
basically a little bit of probabilistic inference 00:53:17.000 |
"If they fall, how far will they fall?" or "Which 00:53:19.000 |
way will they fall?" or "What would happen if 00:53:27.000 |
stuff?" or vice versa. "How will that change the 00:53:29.000 |
direction of fall?" or "Look at those red and 00:53:33.000 |
which look like they should be falling, but aren't." 00:53:39.000 |
that one color block is much heavier than the 00:54:01.000 |
seen me talk about these things, so you might have seen 00:54:03.000 |
this task, but probably only if you saw me give a talk 00:54:07.000 |
red-yellow task, and again, we'll make this one interactive. 00:54:15.000 |
bumped hard enough to knock some of the blocks 00:54:19.000 |
likely to be red blocks or yellow blocks?" What do you say? 00:54:43.000 |
just experience for yourself what it's like to 00:54:45.000 |
be a subject in one of these experiments. We just 00:54:47.000 |
did the experiment here. The data's all captured on 00:54:53.000 |
sometimes people were very quick, other times 00:54:55.000 |
people were slower. Sometimes there was a lot of 00:54:57.000 |
consensus, sometimes there was a little bit less consensus. 00:55:01.000 |
So again, there's a long history of studying this 00:55:09.000 |
inference at work. Probabilistic inference over 00:55:37.000 |
sense reasoning, the same thing happened. Namely, all 00:55:39.000 |
the yellow blocks went over onto one side of the table 00:55:43.000 |
So it didn't matter which of those simulations 00:55:45.000 |
you ran in your head. You'd get the same answer in this 00:55:51.000 |
you didn't have to run the simulation for very long. 00:55:55.000 |
like that to see what's going to happen, or similarly here. 00:55:57.000 |
You only have to run it for a few time steps. 00:55:59.000 |
And it doesn't have to be even very accurate. 00:56:03.000 |
will give you basically the same answer at the 00:56:05.000 |
level that matters for common sense. So that's 00:56:07.000 |
the kind of thing our model does. It runs a few 00:56:11.000 |
time steps. But if you take the average of what 00:56:13.000 |
happens there and you compare that with people's 00:56:21.000 |
of people. On the x-axis, the average judgments 00:56:23.000 |
of this model. And it does a pretty good job. It's not 00:56:35.000 |
But I'll come back to that in a second. It just does it by 00:56:37.000 |
probabilistic reasoning over a game physics simulation. 00:56:41.000 |
and we have used, the same kind of technology 00:56:55.000 |
understanding of other people's actions, what we call 00:57:05.000 |
into any details. I'll just point to a couple 00:57:13.000 |
people working on, both of these are experiments 00:57:19.000 |
than even some of the babies I showed you before, 00:57:21.000 |
but basically like that youngest baby, the one 00:57:37.000 |
You'll see the scene is occluded and then after another 00:57:39.000 |
period of time, one of the objects will appear at the 00:57:51.000 |
Just like an adult, if I show you something that's surprising 00:58:03.000 |
you can measure whether you've shown them something surprising 00:58:07.000 |
There are literally hundreds of studies, if not more, 00:58:17.000 |
a quantitative model where we were able to show 00:58:21.000 |
in this case and surprise. So things which were objectively 00:58:25.000 |
of these probabilistic physics simulations across 00:58:31.000 |
when the scene was occluded, how long the delay was, 00:58:33.000 |
various physically relevant variables. How many 00:58:51.000 |
Spelke's lab, they're at Harvard, but they're 00:58:55.000 |
which was about infants' understanding of goals. 00:58:57.000 |
So this is more like, again, understanding of agents 00:59:05.000 |
that seems to be doing something, like an animated 00:59:15.000 |
much does the agent want the goal that it seems 00:59:37.000 |
We think of this as representing what we sometimes 00:59:47.000 |
are a little bit costly to achieve goal states 00:59:51.000 |
most basic way, the oldest way, to think about 00:59:55.000 |
seems that even 10-month-olds understand some 00:59:57.000 |
version of that, where the cost can be measured 01:00:05.000 |
leave some time for discussion. So I'll just go 01:00:07.000 |
very quickly through a couple of other things, 01:00:15.000 |
showed you here was the science. Where does the 01:00:21.000 |
a machine system that can look not at a little 01:00:23.000 |
animated cartoon like these baby experiments, 01:00:25.000 |
but a real person doing something. And again, 01:00:29.000 |
constraints of actions with some understanding 01:00:47.000 |
here, and she's going to be reaching for one of them. 01:00:51.000 |
but raise your hand when you know which one she's reaching 01:01:20.000 |
It's well before she actually touched the object. 01:01:22.000 |
How does the model work? Again, I'll skip the details, 01:01:32.000 |
robotics. So we use what's called the MuJoCo physics 01:01:44.000 |
a goal object as input. We can give it each of 01:01:50.000 |
so the one that uses the least energy to get to 01:01:54.000 |
inference. This is the probabilistic inference part. 01:02:14.000 |
moving up and down, that's the Bayesian posterior 01:02:20.000 |
you can see is it converges on the right answer, 01:02:22.000 |
at least, well, it turns out to be the ground truth 01:02:24.000 |
right answer, but it's also the right answer according to what people 01:02:32.000 |
if I just wanted to build a system that could detect 01:02:34.000 |
what somebody was reaching for, I could generate 01:02:50.000 |
interesting scenes that you haven't really seen 01:02:52.000 |
much of before. So take the scene on the left, 01:03:08.000 |
actually a pane of glass, right? Do you see that? 01:03:22.000 |
and know how to help him? And then how do we look 01:03:24.000 |
at the two of them and figure out who's trying to help 01:03:34.000 |
of how that might work, and we think this is the 01:03:36.000 |
kind of model needed to tackle this sort of challenge 01:03:48.000 |
models inside each other. So we say, an agent 01:04:00.000 |
about another agent's expected utility, and that's what it 01:04:04.000 |
is sort of the opposite, if one seems to be trying 01:04:12.000 |
infants' understanding of helping and hindering 01:04:18.000 |
because everybody wants to know about learning, and 01:04:26.000 |
you on is really about what learning is about. 01:04:28.000 |
It'll be just a few more slides, and then I'll stop, I promise. 01:04:34.000 |
They certainly didn't do any task-specific learning. 01:04:40.000 |
not to say that we don't think people learn to do these 01:04:54.000 |
Not that there isn't learning beyond one year, but 01:04:56.000 |
the basic way you learn to, say, solve these physics 01:05:06.000 |
that come from the literature on infant cognitive 01:05:14.000 |
really is quite inspiring, I think, and I can give you 01:05:28.000 |
certain understanding of basic aspects of physics. 01:05:32.000 |
want to study how people learn to be intelligent, 01:05:36.000 |
at this age. You have to study what's already 01:05:40.000 |
what they learn and how they learn between four, 01:05:56.000 |
abilities, then what we need, if we're going to 01:06:00.000 |
you might think of as a program learning program. 01:06:02.000 |
If your knowledge is in the form of a program, 01:06:04.000 |
then you have to have programs that build other programs. 01:06:06.000 |
This is what I was talking about at the beginning 01:06:12.000 |
what we start off with is something like a game 01:06:16.000 |
then what you have to learn is the program of the game 01:06:18.000 |
that you're actually playing, or the many different games 01:06:20.000 |
that you might be playing over your life. So think 01:06:24.000 |
engine in your head to fit with your experience 01:06:30.000 |
Now this is what you could call the hard problem 01:06:32.000 |
of learning if you come to learning from, say, neural 01:06:38.000 |
of machine learning go right now, and certainly what 01:06:46.000 |
many of the functions you might want to do in 01:06:54.000 |
So you can have one of these nice optimization 01:06:56.000 |
landscapes, you can compute the gradients and basically 01:07:02.000 |
if you're talking about learning as something like search 01:07:06.000 |
know how to do anything like that yet. We don't know 01:07:10.000 |
optimization problem with any notion of smoothness 01:07:16.000 |
learning as like rolling downhill effectively, 01:07:30.000 |
cognitive development called the child as scientist, 01:07:44.000 |
the child as hacker. But the rest of the world 01:07:48.000 |
someone who breaks into your email and steals 01:07:50.000 |
your credit card numbers. We all know that hacking 01:08:10.000 |
applications or their tasks, more explainable 01:08:16.000 |
of those goals in learning. And the activities 01:08:34.000 |
that you can tune on a data set. That's basically 01:08:36.000 |
what you do with backprop or stochastic gradient 01:08:40.000 |
But think about all the ways in which you might 01:08:42.000 |
actually modify the underlying functions. So write 01:08:48.000 |
or make a whole new library of code, or refactor 01:09:06.000 |
and children's learning has analogs to all of 01:09:10.000 |
as an engineer from an algorithmic point of view. 01:09:24.000 |
our group, which you might not have thought of being about this, 01:09:32.000 |
We had this paper that was in science, it was actually 01:09:42.000 |
which is partly a testament to the really great 01:09:44.000 |
work that Brendan Lake, who was the first author, did 01:10:00.000 |
concepts, these handwritten characters in many 01:10:02.000 |
of the world's alphabets. For those of you who know 01:10:14.000 |
not because Jan Lekun, who put that together, 01:10:16.000 |
or Jeff Hinton, who did a lot of work on deep learning 01:10:38.000 |
programs. In this case, what are those programs? 01:10:40.000 |
They're the programs you use to draw a character. 01:10:46.000 |
how somebody might draw it? The way we tested 01:10:56.000 |
And then we compared nine people, like say, on the left 01:11:04.000 |
was the human drawing another example, or imagining 01:11:08.000 |
And people couldn't tell. When I said, "One's on the left, 01:11:10.000 |
one's on the right," I don't actually remember. 01:11:12.000 |
And on different ones, you can see if you can tell. It's very 01:11:14.000 |
hard to tell. Can you tell which is, for each 01:11:18.000 |
examples were drawn by a human versus a machine? 01:11:36.000 |
And when you see a character, you're working backwards 01:11:38.000 |
to figure out, what was the program, the most 01:11:40.000 |
efficient program that did that? So you're basically 01:11:52.000 |
programs, to being able to learn something ultimately 01:11:56.000 |
The last thing I'll leave you with is just a pointer 01:12:02.000 |
student who works partly with me, but also with 01:12:16.000 |
where Armando comes from, which is the world of 01:12:18.000 |
programming languages, not machine learning or 01:12:26.000 |
the machine learning toolkit, in this case a kind of 01:12:42.000 |
So we think by combining these kinds of tools, 01:12:44.000 |
in this case, let's say, from Bayesian inference 01:12:52.000 |
or haven't been considered to be machine learning 01:12:58.000 |
forward we're going to be able to build smarter, more 01:13:06.000 |
identify the ways in which human intelligence 01:13:14.000 |
of some of the domains where we can start to study this 01:13:16.000 |
in common sense scene understanding, for example, 01:13:22.000 |
one-shot learning, for example, like what we were just doing 01:13:24.000 |
there, or learning as programming the engine in your 01:13:34.000 |
example, as well as a little bit of deep learning 01:13:44.000 |
but think about, for those of you who are interested in technology, 01:13:54.000 |
here in our big research agenda. This is the one 01:13:58.000 |
you know, it could be the rest of my career, honestly, 01:14:10.000 |
It's the idea that Turing proposed when he proposed 01:14:14.000 |
this at different times in his life, or many people have 01:14:16.000 |
proposed this, right, which is to build a system 01:14:20.000 |
does, that starts like a baby and learns like 01:14:22.000 |
a child, and I've tried to show you how we're 01:14:32.000 |
imagine that someday we'll be able to build machines 01:14:34.000 |
that can do this. I think we can actually start working 01:14:38.000 |
that's something that we're doing in our group. 01:14:42.000 |
I encourage you to work on it, maybe even with us, 01:14:44.000 |
or if any one of these other activities of human 01:14:50.000 |
reverse engineering approach that we're doing 01:15:04.000 |
to try to actually achieve at least some kind 01:15:14.000 |
There's many kinds of AI systems that could live in 01:15:16.000 |
worlds of data that none of us can understand or will 01:15:18.000 |
ever live in ourselves, but if you want to build 01:15:20.000 |
machines that can live in our world and interact with 01:15:40.000 |
So, earlier in the talk you expressed some skepticism 01:15:44.000 |
would get us to understanding human-level intelligence. 01:15:50.000 |
industry is better than academia at accumulating 01:15:56.000 |
moment we've got a bit of brain drain going on 01:16:02.000 |
If you look at something like learning to fly 01:16:18.000 |
Is industry going to overtake the field, do you think? 01:16:20.000 |
Well, that's a really good question, and it's got 01:16:22.000 |
several good questions packed into one there. 01:16:34.000 |
currently getting the most attention in industry 01:16:36.000 |
and that are really, because they're really the most 01:16:40.000 |
any industry is really focused on what it can do, 01:16:46.000 |
If you ask, say, Google researchers to take the most 01:16:56.000 |
two years, but maybe take five years or more to really 01:16:58.000 |
develop. But if you can't show that it's going to 01:17:08.000 |
talking about is the technologies which right 01:17:18.000 |
where the route is to something like human-like, 01:17:28.000 |
the basic research that we're doing now will be 01:17:32.000 |
get the attention of industry when the time is right. 01:17:44.000 |
But you're also pointing to issues of like brain 01:17:48.000 |
These are real issues confronting our community. I think 01:17:50.000 |
everybody knows this and I'm sure this will come up 01:18:18.000 |
what industry would consider a snail's pace on 01:18:30.000 |
appreciate or just that really has technological 01:18:42.000 |
learns like a person, then I think we need each other 01:18:44.000 |
now and not just at some point in the future. 01:18:58.000 |
We wanted to raise those issues and it's just 01:19:06.000 |
of what you could call brain drain from the academic 01:19:10.000 |
narrowing in into certain local minima in the 01:19:12.000 |
industry point of view. But they will require 01:19:18.000 |
companies like Google being creative about how they 01:19:20.000 |
might work together in ways that are a little bit outside of 01:19:22.000 |
their comfort zone. I hope that will start to happen 01:19:30.000 |
and I think we need it to happen for the health of 01:20:14.000 |
there's a few reasons, but the slide is only so big. 01:20:20.000 |
versions of my slide in which I do talk about that. 01:20:26.000 |
or emotions aren't important. I think they are important 01:20:32.000 |
but actually partly based on some of my colleagues 01:20:46.000 |
their own and others, and we've been starting to work 01:20:48.000 |
with them on computational models. So that's actually something 01:20:50.000 |
I'm actively interested in and even working on. 01:20:54.000 |
who study emotion or know about this, actually you're going to have 01:20:58.000 |
She's going to basically say a version of the same thing, 01:21:08.000 |
ourselves, of the situation we're in, and of other people. 01:21:16.000 |
about, I mean again, Lisa will talk all about this, 01:21:18.000 |
but if you think about emotion as just a very 01:21:20.000 |
small set of what are sometimes called basic emotions, 01:21:28.000 |
small number of them, right? There's usually only 01:21:36.000 |
that are opposed to some kind of cognitive activity. 01:22:06.000 |
So that means you have to be able to understand, 01:22:08.000 |
you have to have a model, you have to be able to 01:22:10.000 |
do a kind of counterfactual reasoning and to think 01:22:12.000 |
oh, if only I had acted a different way, then I 01:22:14.000 |
can predict that the world would have come out differently, and that's the 01:22:16.000 |
situation I wanted, but instead it came out this other way, 01:22:22.000 |
understanding, okay, I've tried a bunch of times, I thought 01:22:24.000 |
this would work, but it doesn't seem to be working, 01:22:32.000 |
We have to understand, to understand ourselves, 01:22:34.000 |
we need that, to understand other people, to understand 01:22:40.000 |
that I was, just the ones I was talking about 01:22:42.000 |
here with these, say, cost-benefit analyses of action. 01:22:44.000 |
So what I'm, so I'm just trying to say I think 01:22:54.000 |
beyond just like, well, I'm feeling good or bad 01:23:06.000 |
Thanks, Josh, for your nice talk. So all is about 01:23:22.000 |
Sorry, what? Is that what you work on by any chance? 01:23:26.000 |
So in the Center for Brains, Minds, and Machines, 01:23:40.000 |
Rebecca Sachs study with functional brain imaging, 01:23:42.000 |
or the more detailed circuitry, which usually 01:23:44.000 |
requires recording from, say, non-human brains, 01:23:52.000 |
although it's not mostly what I work on, right? 01:23:56.000 |
many other areas of science, certainly in neuroscience, 01:24:34.000 |
behavior, without a sense of the computations 01:25:10.000 |
when you look at the brain at the hardware level 01:25:14.000 |
the software level, right? But when you look at the hardware level, 01:25:26.000 |
but the computations of intelligence are very fast. 01:25:34.000 |
behavior so quickly? That's a great mystery, and I think 01:25:44.000 |
maybe most important is the power consumption, 01:25:50.000 |
the power consumption, the power that the brain 01:25:58.000 |
again, she's doing an internship here, she literally 01:26:08.000 |
project. So somehow she turned a burrito into 01:26:16.000 |
Where if you look at the power that we consume 01:26:28.000 |
from the power of the human brain computationally, 01:27:10.000 |
really starting to think about this. How can we, say, for example, 01:27:16.000 |
very, very low power, but maybe only approximate? 01:27:30.000 |
called Singular Computing, and they have some 01:27:32.000 |
very interesting ideas, including some actual 01:27:44.000 |
to build something that's about the size of this table 01:27:46.000 |
but that has a billion cores, a billion cores, 01:27:50.000 |
kind of power consumption. I would love to have 01:27:54.000 |
wants to help Joe build it, I think he'd love to talk 01:28:00.000 |
Google X, people are working on similar things. 01:28:08.000 |
think you were interested in the brain, if you want to build 01:28:42.000 |
Yeah, I don't know. I'll start with a billion cores 01:28:50.000 |
question in a way that's software independent. 01:29:00.000 |
When you say how far are we, you mean how far 01:29:08.000 |
like they might if I were working at DeepMind? 01:29:18.000 |
I mean, again, this goes back to another reason 01:29:22.000 |
If you look at what we currently call neural networks 01:29:36.000 |
but I think it's just as likely that a neuron 01:29:40.000 |
is that a neuron is something like a computer. 01:29:54.000 |
I think it's like 10 billion cortical pyramidal neurons 01:30:14.000 |
to your question, I don't think we're going to get to what 01:30:16.000 |
I'm talking about or anything like a real brain 01:30:32.000 |
the video game engine in your head, that's a similar 01:30:34.000 |
thing that was driven by the video game industry 01:30:38.000 |
should all play as many video games as we can 01:30:48.000 |
for example, there's a company called Improbable, 01:30:54.000 |
sizable startup at this point, which is building 01:31:02.000 |
idea for very, very big distributed computing 01:31:08.000 |
simulations of the world for much more interesting, immersive, 01:31:12.000 |
one thing that might -- hopefully that will lead to more 01:31:48.000 |
Like, you said the cognitive level is learning how to 01:31:50.000 |
predict, but I'm not sure what you mean by that. 01:31:52.000 |
There's many things you could mean, and what our cognitive 01:31:54.000 |
science is about is learning which of those versions -- 01:31:56.000 |
like, I don't think it's learning how to predict. I think 01:32:00.000 |
actions and to -- you know, all those things. 01:32:06.000 |
you would never predict because they would never 01:32:12.000 |
predicting. When your model could generalize. 01:32:16.000 |
you are interested in, a few hundred neurons in 01:32:18.000 |
prefrontal cortex, they could generalize a lot. 01:32:30.000 |
does? For sure, because that's in the abstract level. 01:32:36.000 |
I mean, what does it mean to say that some neurons 01:32:40.000 |
to put this is to say, look, we have a certain math 01:32:44.000 |
could call it abstract, or I call it software 01:32:48.000 |
all engineering is based on some kind of abstraction. 01:32:50.000 |
But you might have a circuit level abstraction, 01:32:56.000 |
And I'm mostly working at or starting from a more 01:32:58.000 |
software level of abstraction. They're all abstractions. 01:33:10.000 |
how do I know what program they're implementing? 01:33:14.000 |
could I tell what program they're implementing? 01:33:16.000 |
Well, maybe, but certainly it would be a lot easier 01:33:18.000 |
if I knew something about what programs they might be implementing 01:33:20.000 |
before I start to look at the circuitry. If I just 01:33:46.000 |
of the neurons that are closest to the inputs and outputs 01:33:58.000 |
what those things already are, so we can make 01:34:04.000 |
But if you want to talk about flexible planning, 01:34:14.000 |
neurons, we're going to be able to answer those questions 01:34:22.000 |
"Yeah, okay, I get it. I get those insights in a way 01:34:24.000 |
that I can engineer with." And that's what my goal 01:34:28.000 |
at the software level, the hardware level, or the entire 01:34:34.000 |
by taking what we're doing and bringing it to contact 01:34:36.000 |
with people studying neural circuits. But I don't 01:34:40.000 |
and just go straight to the neural circuits. And I think 01:34:46.000 |
at the neural circuit level. And they can help us 01:34:48.000 |
address these other engineering questions that we don't