MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)

Today we have Josh Tenenbaum. He's a professor here at MIT, leading the computational cognitive science group. Among many other topics in cognition and intelligence, he is fascinated with the question of how human beings learn so much from so little. And how these insights can lead to build AI systems that are much more efficient learning from data.

So please give Josh a warm welcome. All right. Thank you very much. Thanks for having me. I'm excited to be part of what looks like really quite a very impressive lineup, especially starting after today. And it's, I think, quite a great opportunity to get to see perspectives on artificial intelligence from many of the leaders in industry and other entities working on this great quest.

So I'm going to talk to you about some of the work that we do in our group, but also I'm going to try to give a broader perspective reflective of a number of MIT faculty, especially those who are affiliated with the Center for Brains, Minds, and Machines. So you can see up there on my affiliation.

Academically, I'm part of Brain and Cognitive Science, or Course 9. I'm also part of CSAIL. But I'm also part of the Center for Brains, Minds, and Machines, which is an NSF-funded center, science and technology center, which really stands for the bridge between the science and the engineering of intelligence.

It literally straddles Vassar Street in that we have CSAIL and BCS members. We also have partners at Harvard and other academic institutions. And again, what we stand for, I want to try to convey some of the specific things we're doing in the center and where we want to go with a vision that really is about jointly pursuing the science, the basic science of how intelligence arises in the human mind and brain, and also the engineering enterprise of how to build something increasingly like human intelligence in machines.

And we deeply believe that these two projects have something to do with each other and are best pursued jointly. Now, it's a really exciting time to be doing anything related to intelligence or certainly to AI for all the reasons that, you know, brought you all here. I don't have to tell you this.

We have all these ways in which AI is kind of finally here. We finally live in the era of something like real practical AI. Or for those who've been around for a while and have seen some of the rises and falls, you know, AI is back in a big way.

But from my perspective, and I think maybe this reflects, you know, why we distinguish what we might call AGI from AI, we don't really have any real AI, basically. We have what I like to call AI technologies, which are systems that do things we used to think that only humans could do, and now we have machines that do them, often quite well, maybe even better than any human who's ever lived, right, like a machine that plays Go.

But none of these systems, I would say, are truly intelligent. None of them have anything like common sense. They have nothing like the flexible, general-purpose intelligence that each of you might use to learn every one of these skills or tasks, right? Each of these systems had to be built by large teams of engineers, working together often for a number of years, often at great cost to somebody who's willing to pay for it.

And each of them just does one thing. So AlphaGo might beat the world's best, but it can't drive to the match or even tell you what Go is. It can't even tell you that Go is a game because it doesn't even know what a game is, right? So what's missing?

What is it that makes every one of your brains-- maybe you can't beat the world's best in Go, but any one of you can get behind the wheel of a car. I think of this because my daughter is going to turn 16 tomorrow. If she lived in California, she'd have a driver's license.

It's a little bit down the line for us here in Massachusetts. But she didn't have to be specially engineered by billion-dollar startups, and she got really into chess recently, and now she's taught herself chess by playing just a handful of games, basically. And she can do any one of these activities, and any one of us can.

So what is it? What makes up the difference? Well, there's many things, right? I'll talk about the focus for us in our research, and a lot of us, again, in CBMM, is summarized here. What drives the successes right now in AI, especially in industry, okay, in all these AI technologies, is many, many things, many things.

But where the progress has been made most recently and what's getting most of the attention is, of course, deep learning, but other kinds of machine-learning technologies which essentially represent the maturation of a decades-long effort to solve the problem of pattern recognition. That means taking data and finding patterns in the data that tell you something you care about, like how to label a class or how to predict some other signal, okay?

And pattern recognition is great. It's an important part of intelligence, and it's reasonable to say that deep learning as a technology has really made great strides on pattern recognition and maybe even, you know, has come in close to solving the problems of pattern recognition. But intelligence is about many other things.

Intelligence is about a lot more. In particular, it's about modeling the world. And think about all the activities that a human does to model the world that go beyond just, say, recognizing patterns in data, but actually trying to explain and understand what we see, for instance, okay? Or to be able to imagine things that we've never seen, that never seen, maybe even very different from anything we've ever seen, but might want to see, and then to set those as goals, to make plans and solve problems needed to make those things real.

Or thinking about learning, again, you know, some kinds of learning can be thought of as pattern recognition if you're learning sufficient statistics or weights in a neural net that are used for those purposes. But many activities of learning are about building out new models, right, either refining, reusing, improving old models, or actually building fundamentally new models as you experience more of the world.

And then think about sharing our models, communicating our models to others, modeling their models, learning from them. All these activities of modeling, these are at the heart of human intelligence, and it requires a much broader set of tools. So I want to talk about the ways we're studying these activities of modeling the world, and something in a pretty non-technical way about what are the kind of tools that we need to capture these abilities.

Now, I think it's-- I want to be very honest up front and to say this is just the beginning of a story, right? When you look at deep learning successes, that itself is a story that goes back decades. I'll say a little bit about that history in a minute.

But where we are now is just looking forward to a future when we might be able to capture these abilities, you know, at a really mature engineering scale. And I would say we are far from being able to capture all the ways in which humans richly, flexibly, quickly build models of the world at the kind of scale that, say, Silicon Valley wants, either big tech companies like Google or Microsoft or IBM or Facebook or small startups, right?

We can get there. And I think what I want to talk to you about here is one route for trying to get there, and this is the route that CBMM stands for, the idea that by reverse engineering how intelligence works in the human mind and brain, that will give us a route to engineering these abilities in machines.

When we say reverse engineering, we're talking about science, but doing science like engineers. That's our fundamental principle, that if we approach cognitive science and neuroscience like an engineer, where, say, the output of our science isn't just a description of the brain or the mind in words, but in the same terms that an engineer would use to build an intelligent system, then that will be both the basis for a much more rigorous and deeply insightful science, but also direct translation of those insights into engineering applications.

Now, I said before I talk a little about history, what I mean by that is this. Again, if part of what brought you here is deep learning, and I know even if you've never heard of deep learning before, which I'm sure is unlikely, you saw a good spectrum of that in the overview session last night.

It's really interesting and important to look back on the history of where did techniques for deep learning come from, or reinforcement learning. Those are the two tools in the current machine learning arsenal that are getting the most attention, things like back propagation or end-to-end stochastic gradient descent or temporal difference learning or Q-learning.

Here's a few papers from the literature. Maybe some of you have read these original papers. Here's the original paper by Rumelhart, Hinton, and colleagues in which they introduced the back propagation algorithm for training multilayer perceptrons, multilayer neural networks. Here's the original perceptron paper by Rosenblatt, which introduced the one-layer version of that architecture and the basic perceptron learning algorithm.

Here's the first paper on the temporal difference learning method for reinforcement learning from Sutton and Bartow. Here's the original Boltzmann machine paper, also by Hinton and colleagues, which, again, those of you who don't know that architecture, think of a kind of probabilistic, undirected, multilayer perceptron. Or, for example, before there were LSTMs, if you know about current recurrent neural network architecture, earlier, much simpler versions of the same idea were proposed by Jeff Ellman and his simple recurrent networks.

The reason I want to put up the original papers here is for you to look at both when they were published and where they were published. So if you look at the dates, you'll see papers going back to the '80s, but even the '60s, or even the 1950s. And look at where they were published.

Most of them were published in psychology journals. So the journal Psychological Review, if you don't know it, is like the leading journal of theoretical psychology and mathematical psychology. Or Cognitive Science, the journal of the Cognitive Science Society. Or the backprop paper was published in Nature, which is a general interest science journal, but by people who are mostly affiliated with the Institute for Cognitive Science in San Diego.

So what you see here is already a long history of scientists thinking like engineers. These are people who are in psychology or cognitive science departments and publishing in those places, but by formalizing even very basic insights about how humans might learn, or how brains might learn, in the right kind of math, that led to, of course, progress on the science side, but it led to all the engineering that we see now.

It wasn't sufficient, right? We needed, of course, lots of innovations and advances in computing hardware and software systems, right? But this is where the basic math came from, and it came from doing science like an engineer. So what I want to talk about in our vision is, what does the future of this look like?

If we were to look 50 years into the future, what would we be looking back on now, or over this time scale? Well, here's a long-term research roadmap that reflects some of my ambitions and some of our center's goals, and many others, too. We'd like to be able to address basic questions, questions of what it is to be and to think like a human, questions, for example, of consciousness, or meaning and language, or real learning, right?

Questions like, even beyond the individual, like questions of culture, or creativity. Those are our big ideas up there. And for each of these, there are basic scientific questions. How do we become aware of the world and ourselves in it? It starts with perception, but it really turns into awareness, awareness of yourself and of the world, and what we might call consciousness, right?

How does a word start to have a meaning? What really is a meaning, and how does a child grasp it? Or how do children actually learn? What do babies' brains actually start with? Are they blank slates, or do they start with some kind of cognitive structure? And then what does real learning look like?

These are just some of the questions that we're interested in working on. Or when we talk about culture, we mean, how do you learn all the things you didn't directly experience, right, but that somehow you got from the accumulation of knowledge in society over many generations? Or how do you ever think of new ideas, or answers to new questions?

How do you think of the new questions themselves? How do you decide what to think about? These are all key activities of human intelligence. When we talk about how we model the world, where our models come from, what we do with our models, this is what we're talking about.

And if we could get machines that could do these things, well, again, on the bottom row, think of all the actual real engineering payoffs. Now, in our center, in both my own activities and a lot of what my group does these days, and what a number of other colleagues in the Center for Brains, Minds, and Machines do, as well as, you know, very broadly people in BCS and CSAIL, one place where we work on the beginnings of these problems, in the near term, this is the long term, like think 50 years, okay, maybe shorter, maybe longer, I don't know, but think well beyond 10 years.

But in the short term, 5 to 10 years, a lot of our focus is around visual intelligence, and there's many reasons for that. Again, we can build on the successes of deep networks, and a lot of pattern recognition and machine vision. It's a good way to put these ideas into practice.

When we look at the actual brain, the visual system in the brain, in the human and other mammalian brains, for example, is really, very clearly the best understood part of the brain, and at a circuit level, it's the part of the brain that's most inspired current deep learning and neural network systems.

But even there, there's things which we still don't really understand like engineers. So here's an example of a basic problem in visual intelligence that we and others in the Center are trying to solve. Look around you, and you feel like there's a whole world around you, and there is a whole world around you, and you feel like your brain captures it.

But what the actual sense data that's coming in through your eyes looks more like this photograph here, where you can see there's a crowd scene, but it's mostly blurry except for a small region of high resolution in the center. So that corresponds biologically to what part of the image is in your fovea.

That's the central region of cells in the retina, where you have really high resolution visual data. The size of your fovea is roughly like if you hold out your thumb at arm's length, it's a little bit bigger than that, but not much bigger. Most of the image, in terms of the actual information coming in in a bottom-up sense to your brain, is really quite blurry.

But somehow by looking at just one part, and then by saccading around or making a few eye movements, you get a few glimpses, each not much bigger than the size of your thumb at arm's length. Somehow you stitch that information together into what feels like and really is a rich representation of the whole world around you.

And when I say around you, I mean literally around you. So here's another kind of demonstration. Without turning around, nobody's allowed to turn around, ask yourself, what's behind you? Now the answer's going to be different for different people, depending on where you're sitting. For most of you, you might think, well, I think there's a person pretty close behind me.

You know you're in a crowded auditorium, although you haven't seen that person. You know that they're there. For people in the very back row, you know there isn't a person behind you, and you're conscious of being in the back row. You might be conscious that there's a wall right behind you.

But now for the people who are in the room, not in the very back, think about how far behind you is the back, like where's the nearest wall behind you? So we can, maybe we can call out, try a little demonstration. So I don't know, I'm pointing to someone there.

Can you see, say something if you think I'm pointing at you. Well, I could have been pointing at you, but I'm pointing to someone behind you, okay. I'll point to you, yeah, I'm pointing to you. All right, so how far is the nearest wall? No, you can't turn around, you've blown your chance.

(audience laughing) Without turning around, okay, so you were, okay, do you see I'm pointing to you there with the tie? Okay, so without turning around, how far is the nearest wall behind you? That's, sorry, how far? Five meters, okay, well, I mean, that might be about right. No, no, other people can turn around.

Now, how about you, how far is the nearest wall behind you? 10 meters, okay, that might be right, yeah. How about here, what do you think? 20, okay, so yeah, since I didn't grow up in the metric system, I barely know, but yeah, I mean, I mean, the point is that like, you're, each of you is surely not exactly right, but you're certainly within an order of magnitude, and I guess if we actually tried to measure, you know, you're probably, my guess is you're probably right within, you know, 50% or less, often, you know, maybe just 20% error.

Okay, so how do you know this? I mean, even if it's not, what did you say, 20 meters? Even if it's not 20 meters, it's probably closer to 20 meters than it is to five or 10 meters, and then it is to 50 meters, so how do you know this?

You haven't turned around in a while, right? But some part of your brain is tracking the whole world around you, right? And how many people are behind you? Yeah, like a few hundred, right? I mean, I don't know if it's 200 or 300, but it's not 1,000, I don't think so, and it's certainly not 10 or 20 or 50, right?

So you track these things, and you use them to plan your actions. Okay, so again, think about how instantly, effortlessly, and very reliably, okay, your brain computes all these things, so the people and objects around you, and it's not just, you know, approximations. Certainly, when we're talking about what's behind you in space, there's a lot of imprecision, but when it comes to reaching for things right in front of you, very precise shape and physical property estimates needed to pick up and manipulate objects, and then when it comes to people, it's not just the existence of the people, but something about what's in their head, right?

You track whether someone's paying attention to you when you're talking to them, what they might want from you, what they might be thinking about you, what they might be thinking about other people, okay? So when we talk about visual intelligence, this is the whole stuff we're talking about, and you can start to see how it turns into basic questions, I think, of what we might call the beginnings of consciousness, or at least our awareness of ourself in the world, and of ourselves as a self in the world, but also other aspects of higher-level intelligence and cognition that are not just about perception, like symbols, right, to describe, even to ourselves, what's around us and where we are and what we can do with it.

You have to go beyond just what we would normally call the stuff of perception to, say, the thoughts in somebody's head and your own thoughts about that, okay? So what we've been doing in CBMM is trying to develop an architecture for visual intelligence, and I'm not going to go into any of the details of how this works, and this is just notional, this is just a picture, it's like just a sketch from a grant proposal of what we say we want to do, but it's based on a lot of scientific understanding of how the brain works.

There are different parts of the brain that correspond to these different modules in our architecture, as well as some kind of emerging engineering way to try to capture at the software and maybe even hardware levels how these modules might work. So we talk about sort of an early module of a visual or perceptual stream, which is like bottom-up visual or other perceptual input.

That's the kind of thing that is pretty close to what we currently have in, say, deep convolutional neural networks. But then we talk about some kind of-- the output of that isn't just pattern class labels, but what we call the cognitive core, core cognition. So again, an understanding of space and objects, their physics, other people, their minds, that's the real stuff of cognition that has to be the output of perception.

But somehow we have to have-- this is what we call the brain OS in this picture. We have to get there by stitching together the bottom-up inputs from a glimpse here, a glimpse here, a little bit here and there, and accessing prior knowledge from our memory systems to tell us how to stitch these things together into the really core cognitive representations of what's out there in the world.

And then if we're going to start to talk about it in language or to build plans on top of what we have seen and understood, that's where we talk about symbols coming into the picture, the building blocks of language and plans and so on. So now we might say, well, okay, this is an architecture that is brain-inspired and cognitively inspired, and we're planning to turn into real engineering.

And you can say, well, do we need that? Maybe--again, I know this is a question you considered in the first lecture. Maybe the engineering toolkit that's currently been making a lot of progress in, let's say, industry, maybe that's good enough. Maybe let's take deep learning, but to stand for a broader set of modern pattern-recognition-based and reinforcement-learning-based tools and say, okay, well, maybe that can scale up to this.

And you might--maybe that's possible. I'm happy in the question period if people want to debate this. My sense is no. I think that it's not--when I say no, I don't mean, like, it can't happen or it won't happen. What I mean is the highest value, the highest expected route right now is to take this more science-based reverse engineering approach, and that at least if you follow the current trajectory that industry incentives especially optimize for, it's not even really trying to take us to these things.

So think about, for example, a case study of visual intelligence that is in some ways as pattern recognition very much of a success. It's, again, been mostly driven by industry. It's something that if you read in the news or even play around with in certain publicly available data sets, feels like we've made great progress.

And this is an aspect of visual intelligence, which is sometimes called image captioning or mapping images to text. You know, basically there's been a bunch of systems. Here's a couple of press releases. This one's about Google. Google's AI can now capture images almost as well as humans. Here's one about Microsoft.

A couple of years ago, I think, there were something like eight papers all released onto archive around the same time from basically all the major industry computer vision groups as well as a couple of academic partners, which all driven by basically the same data set produced by some Microsoft researchers and other collaborators, trained a combination of deep convolutional neural networks, state-of-the-art visual pattern recognition, with recurrent neural networks, which had recently been developed for basically kinds of neural statistical language modeling, glued them together and produced a system which made very impressive results in a big training set and a held-out test set where the goal was to take an image and write a sentence-like, a short sentence caption that would seem like the kind of way a human would describe that image.

And these systems surpassed human-level accuracy on the held-out test set from a big training set. But what you can see when you really dig into these things is there's often a lot of what I would call data set overfitting. It's not overfitting to the training set, but it's overfitting to whatever are the particular characteristics of this data set, wherever it came from, certain set of photographs and certain ways of captioning them, which even a big data set, it's not about quantity.

It's more about the quality, the nature of what people are doing. So one way to test this system is to apply it to what seems like basically the same problem, but not within a certain curated or built data set. And there's a convenient Twitter bot that lets you do this.

So there's something called the PickDeskBot, which takes one of the state-of-the-art industry AI captioning systems, a very good one. Again, this is not meant to--I'm not trying to critique these systems for what they're trying to do. I'm just trying to point out what they don't really even try to do.

So this takes the Microsoft CaptionBot, and just every couple of hours takes a random image from the web, captions it, and uploads the results to Twitter. And a couple of months ago, when I prepared a first version of this talk, I just took a few days in the life of this Twitter bot.

I didn't take every single image, but I took, you know, most of the images in a way that was meant to be representative of the successes and the kinds of failures that such a system will make. So we can go through this, and it's a little bit entertaining and I think quite informative.

So here's just a somewhat random sample of a few days in the life of one of these CaptionBots. So here we have a picture of a person holding-- fortunately, my screen is very small here, and I can't read up there, so maybe you'll have to tell me what it says--but a person holding a cell phone.

I guess I'll just read along with you. So we have a person holding a cell phone. Well, it's not a person holding a cell phone, but it's kind of close. It's a person holding some kind of machine. I don't even know what that is, but it's some kind of musical instrument, right?

So that's a mixed success or failure. Here's a person on a field playing football. I would call that an A result, maybe even A+. Here's a group of people standing on top of a mountain. So less good. There's a mountain, but as far as I can tell, there's no people.

But these systems like to see people because in the data set they were trained on, there's a lot of people, and people often talk about people. And the fact that you can appreciate both what I said and why it's funny-- there you did some of my cognitive activities that this system is not even trying to do.

Okay, here we've got a building with a cake. I'll go through these fast. A building with a cake, a large stone building with a clock tower. I think that's pretty good. I'd give that like a B+. There's no clock, but it's plausibly right. There might be a clock in there.

There's definitely something like that. Here's a truck parked on the side of a building. I don't know, maybe a B-. There is a car on the side of a building, but it's not a truck, and it doesn't seem like the main thing in the image. Here's a necklace made of bananas.

(laughter) A large ship in the water. This is pretty good. I'd give this like an A- or B+, because there is a ship in the water, but it's not very large. It's really more of like a tugboat or something. Here's a sign sitting on the grass. You know, in some sense, that's great.

No, but in another sense, it's really missing what's actually interesting and important and meaningful to humans. Here's a garden that's in the dirt. (laughter) A pizza sitting on top of a building. A small house with a red brick building. I don't know what kind of weird way of saying it.

A vintage photo of a pond. That's good. They like vintage photos. A group of people that are standing in the grass near a bridge. Again, there's two people, and there's some grass, and there's a bridge, but it's really not what's going on. A person in a yard. Okay, kind of.

A group of people standing on top of a boat. There's a boat, there's a group of people, they're standing, but again, the sentence that you see is more based on a bias of what people have said in the past about images that are only vaguely like this. A clock tower is lit up at night.

That's actually, I think, pretty impressive. A large clock mounted to the side of a building. A little bit less so. A snow-covered field. Very good. A building with snow on the ground. A little bit less good. There's no snow. It's white. Some people who--I don't know them, but I bet that's probably right, because identifying faces and recognizing people who are famous because they won medals in the Olympics, probably I would trust current pattern recognition systems to get that.

A painting of a vase in front of a mirror. Less good. I think there's a guy in there, but we didn't get him. A person walking in the rain. Again, there is sort of a person, and there's some puddles, but a group of stuffed animals. A car parked in a parking lot.

That's good. A car parked in front of a building. Less good. A plate with a fork and knife. A clear blue sky. Okay, so you get the idea. Again, if you actually go and play with this system, partly because I think--my friends at Microsoft told me they've improved it some.

This is partly for entertainment values. I chose what also would be the funnier example. I want to be quite honest about this. I'm not trying to take away what are impressive AI technologies, but I think it's clear that there's a sense of understanding in any one of these images that it's important to see that even when it seems to be correct, if it can make the kind of errors that it makes, that even when it seems to be correct, it's probably not doing what you're doing, and it's probably not even trying to scale towards the dimensions of intelligence that we think about when we're talking about human intelligence.

Another way to put this--I'm going to show you a really insightful blog post from one of your other speakers. In a couple of days, I'm not sure, you're going to have Andrej Karpathy, who's one of the leading people in deep learning. This is a really great blog post he wrote a couple of years ago when he was, I think, still at Stanford.

He got his PhD from Stanford. He worked at Google a little bit on some early big neural net AI projects there. He was at OpenAI. He was one of the founders of OpenAI. Recently, he joined Tesla as their director of AI research. About five years ago, he was looking at the state of computer vision from a human intelligence point of view and lamenting how far away we were.

This is the title of his blog post, "The State of Computer Vision and AI." We are really, really far away. He took this image, which was a famous image in its own right. It was a popular image of Obama back when he was president playing around as he liked to do when he was on tour.

If you take a look at this, you can see you probably all can recognize the previous president of the United States, but you can also get the sense of where he is and what's going on. You might see people smiling, and you might get the sense that he's playing a joke on someone.

Can you see that? How do you know that he's playing a joke and what that joke is? As Andrej goes on to talk about in his blog post, if you think about all the things that you have to really deploy in your mind to understand that, it's a huge list.

Of course, it starts with seeing people and objects and maybe doing some face recognition, but you have to do things like, for example, notice his foot on the scale and understand enough about how scales work that when a foot presses down, it exerts force, that the scale is sensitive.

It doesn't just magically measure people's weight, but it does that somehow through force. You have to see who can see that he's doing that and who cannot see that he's doing that, in particular the person on the scale, and why some people can see that he's doing that and can see that some other people can't see it, why that makes it funny to them.

Someday we should have machines that can understand this, but hopefully you can see why the kind of architecture that I'm talking about would be the building blocks or the ingredients to be able to get them to do that. Again, I prepared a version of this talk a few months ago, and I wrote to Andrej and I said I was going to use this, and I was curious if he had any reflections on this and where he thought we were relative to five years ago, because certainly a lot of progress has been made.

But he said--here's his email. I hope he doesn't mind me sharing it, but again, he's a very honest person, and that's one of the many reasons why he's such an important person right now in AI. He's both very technically strong and honest about what we can do, what we can't do, and as he says--what does he say?

"It's nice to hear from you. It's fun you should bring this up. I was also thinking about writing a return to this." And in short, basically, I don't believe we've made very much progress. He points out that in his long list of things that you'd need to understand the image, we have made progress on some--the ability to, again, detect people and do face recognition for well-known individuals.

But that's kind of about it. And he wasn't particularly optimistic that the current route that's being pursued in industry is anywhere close to solving or even really trying to solve these larger questions. If we give this image to that caption bot, what we see is, again, represents the same point.

So here's the caption bot. It says, "I think it's a group of people standing next to a man in a suit and tie." So that's right as far as it goes. It just doesn't go far enough, and the current ideas of build a data set, train a deep learning algorithm on it, and then repeat aren't really even, I would venture, trying to get to what we're talking about.

Or here's another--I'll just give you one other example of a couple of photographs from my recent vacation in a nice, warm, tropical locale, which I think illustrate ways in which, again, the gap where we have machines that can, say, beat the world's best at Go, but can't even beat a child at tic-tac-toe.

Now, what do I mean by that? Well, of course, we don't even need reinforcement learning or deep learning to build a machine that can win or tie, do optimally in tic-tac-toe. But think about this. This is a real tic-tac-toe game, which I saw on the grass outside my hotel.

What do you have to do to look at this and recognize that it's a tic-tac-toe game? You have to see the objects. You have to see what's--in some sense, there's a 3-by-3 grid, but it's only abstract. It's only delimited by these ropes or strings. It's not actually a grid in any simple geometric sense.

But yet a child can look at that--and indeed, here's an actual child who was looking at it--and recognize, "Oh, it's a game of tic-tac-toe," and even know what they need to do to win, namely put the X and complete it, and now they've got three in a row. That's literally child's play.

You show this sort of thing, though, to one of these image-understanding caption bots, and I think it's a close-up of a sign. Again, saying that this is a close-up of a sign is not the same thing, I would venture, as a cognitive or computational activity that's going to give us what we need to, say, recognize the object, to recognize it as a game, to understand the goal, and how to plan to achieve those goals.

Whereas this kind of architecture is designed to try to do all of these things, ultimately. I bring in these examples of games or jokes to really show where perception goes to cognition, all the way up to symbols. So to get objects and forces and mental states, that's the cognitive core, but to be able to get goals and plans and what do I do or how do I talk about it, that's symbols.

Here's another way into this, and it's one that also motivates, I think, a lot of really good work on the engineering side, and a lot of our interest in the science side, is think about robotics and think about what do you have to do to -- what does the brain have to be like to control the body?

So again, you're going to hear from, shortly, I think maybe it's next week, from Mark Raybert, who's one of the founders of Boston Dynamics, which is one of my favorite companies anywhere. They're without doubt the leading maker of humanoid robots, legged locomoting robots in industry. They have all sorts of other really cool robots, robots like dogs, robots that have -- I think you'll even get to see a live demonstration of one of these robots.

It's really awesome, impressive stuff. But what about the minds and brains of these robots? Well, again, if you ask Mark, ask them how much of human-like cognition do they have in their robots, and I think he would say very little. In fact, we have asked him that, and he would say very little.

He has said very little. He's actually one of the advisors of our center, and I think in many ways we're very much on the same page. We both want to know, how do you build the kind of intelligence that can control these bodies like the way a human does?

Here's another example of an industry robotics effort. This is Google's Arm Farm, where they've got lots of robot arms, and they're trying to train them to pick up objects using various kinds of deep learning and reinforcement learning techniques. I think it's one approach. I just think it's very, very different from the way humans learn to, say, control their body and manipulate objects.

You can see that in terms of things that go back to what you were saying when you were introducing me. Think about how quickly we learn things. Here you have the Arm Farmers trying to generate, effectively, maybe if not infinite, but hundreds of thousands, millions of examples of reaches and pickups of objects, even with just a single gripper.

Yet a child, who in some ways can't control their body nearly as well as robots can be controlled at the low level, is able to do so much more. I'll show you two of my favorite videos from YouTube here, which motivate some of the research that we're doing. The one on the left is a one-and-a-half-year-old, and the other one's a one-year-old.

Just watch this one-and-a-half-year-old here doing a popular activity for many kids. Is it playing? Hmm. Do you see a video up there? Hmm. Okay, there we go. Okay, so he's doing this stacking cup activity. He's stacking up cups to make a tall tower. He's got a stack of three, and what you can see from the first part of this video is it looks like he's trying to make a second stack that he's trying to pick up at once.

Basically, he's trying to make a stack of two that'll go on the stack of three. And he's trying to debug his plan, because it got a little bit stuck here. And think about it. I mean, again, if you know anything about robots manipulating objects, even just what he just did, no robot can decide to do that and actually do it.

At some point, he's almost got it. It's a little bit tricky, but at some point he's going to get that stack of two. He realizes he has to move that object out of the way. Look at what he just did. Move it out of the way, use two hands to pick it up.

And now he's got a stack of two on a stack of three, and suddenly, you know, sub-goal completed. He's now got a stack of five. He's got a stack of 10, because he knows he accomplished a key waypoint along the way to his final goal. That's a kind of early symbolic cognition, right?

To understand that I'm trying to build a tall tower, but a tower is made up of little towers. And you can take a tower and put it on top of another tower, or stack a stack on a stack, and you have a bigger stack. So think about how he goes from bottom-up perception to the objects, to the physics needed to manipulate the objects, to the ability to make even those early kinds of symbolic plans.

At some point, he keeps doing this. He puts another stack on there. I'll just jump to the end. Oops, sorry, you missed--sorry. He gets really excited, and he gives himself another big hand, but falls over. OK, again, Boston Dynamics now has robots that could pick themselves up after that.

That's really impressive, again. But all the other stuff to get to that point, we don't really know how to do in a robotic setting. Or think about this baby here. This is a younger baby. This is one of the Internet's very most popular videos because it features a baby and a cat.

(laughter) But the baby's doing something interesting. He's got the same cups, but he's decided-- he's, again, decided to try a new thing. So think about creativity. He's decided that his goal is to stack up cups on the back of a cat, I guess. He's asking, "How many cups can I fit on the back of a cat?" Well, three.

Let's see, can I fit more? Let's try another one. OK, well, he can't fit more than three, it turns out. And then he-- then, ugh, it's not working. So he changes his goal. Now his goal appears to be to get the cups on the other side of the cat.

Now watch that part when he reaches back behind him there. That's--I'll just pause it there for a moment. So when he just reached back there, that's a particularly striking moment in the video. It shows a very strong form of what we call in cognitive science object permanence. That's the idea that you represent objects as these permanent, enduring entities in the world, even when you can't see them.

In this case, he hadn't seen or touched that object behind him for, like, at least a minute, right? Maybe much longer, I don't know. And yet he still knew it was there, and he was able to incorporate it in his plan. There's a moment before that when he's about to reach for it, but then he sees this other one, right?

And it's only when he's now exhausted all the other objects here that he can see, he's like, "OK, now time to get this object and bring it into play," right? So think about what has to be going on in his brain for him to be able to do that, right?

That's like the analog of you understanding what's behind you. It's not that these things are impossible to capture machines. Far from it. It's just that training a deep neural network or any kind of pattern recognition system, we don't think is going to do it. But we think by reverse engineering how it works in the brain, we might be able to do it.

I believe we can do it. It's not just humans that do this kind of activity. Here's a couple of, again, rather famous videos. You can watch all of these on YouTube. Crows are famous object manipulators and tool users, but also orangutans, other primates, rodents. We can watch--here, let me pause this one for a second.

If we watch this orangutan here, he's got a bunch of big Legos, and over the course of this video, he's building up a stack of Legos. It's really quite impressive. I'm just jumping to the end. There's actually some controversy out there of whether this video is a fake. But the controversy isn't about-- it's not like whether it was, I don't know, done with computer animation.

Some people think the video was actually filmed backwards, that a human built up the stack, and the orangutan just slowly disassembled it piece by piece. And it turns out it's remarkably hard to tell whether it's played forward or backwards in time, and people have argued over little details. Because it would be quite impressive if an orangutan actually was able to build up this really impressive stack of Legos.

But I would submit that it would be almost as impressive if he disassembled it. Think about the activity. I mean, if I wanted to disassemble that, the easiest thing to do would just be to knock it over. But to piece by piece disassemble it, even if it's played backwards like this, that's still a really impressive act of symbolic planning on physical objects.

Or here you've got this famous mouse. This you can find on the internet under the "Mouse vs. Cracker" video. And what you'll see here over the course of this video is a mouse valiantly and mostly hopelessly struggling with a cracker that they're hoping to bring back to their nest.

I guess it's a very appealing big meal. And at some point after just trying to get it over with, it's actually able to do it. And at some point after just trying to get it over the wall, at some point the mouse just gives up because it's just never going to happen.

And he just goes away. Except that because even mouses can dream, or mice can dream, at some point he decides, "Okay, I'm just going to come back for one more try." And he tries one more time, and this time valiantly gets it over. Isn't that very impressive? Congratulations, mouse.

You can clap for me at the end, or clap for whoever later. But I want to applaud the mouse there every time I see that. But again, think what had to be going on in his brain to be able to do that. It's a crazy thing, and yet he formulated the goal and was able to achieve it.

I'll just show one more video that is really more about science. These other ones are, some of them actually were from scientific experiments. But this is one that motivates a lot of the science that I do, and to me it sets up a grand cognitive science challenge for AI and robotics.

It's from an experiment with humans, again, 18-month-olds or 1-1/2-year-olds. The kids in this experiment were the same age as the first baby I showed you, the one who did the stacking. And 18 months is really a very good age to study if you're interested in intelligence, for reasons we can talk about later if you're interested.

This is from a very famous experiment done by two psychologists, Felix Wernicken and Michael Tomasello. It was studying the spontaneous helping behavior of young children. It also contrasted humans and chimps. The punchline is that chimps sometimes do things that are kind of like what this human did, but not nearly as reliably or as flexibly.

I'll show you a particular unusual situation where human kids had relatively little trouble figuring out what to do or even whether they should do it, whereas basically no chimp did what you're going to see humans sometimes doing here. The experimenter in this movie, and I'll turn on the sound here if you can hear it, the experimenter is the tall guy, and the participant is the little kid in the corner.

There's sound but no words, right? And at some point he stops and then the kid just does whatever they want to do. So watch what he does. He goes over, he opens the cabinet, looks inside, then he steps back and he looks up at Felix and then looks down, and then the action is completed.

I want you to watch it one more time and think about what's got to be going inside the kid's head to understand this. So it seems like what it looks like to us is the kid figured out that this guy needed help and helped him. And the paper is full of many other situations like this.

This is just one. But the key idea is that the situation is somewhat novel. People have seen people holding books and opening cabinets, but probably it's very rare to see this kind of situation exactly. It's different in some important details from what you might have seen before. And there's other ones in there that are really truly novel because they just made up a machine right there.

But somehow he has to understand causally from the way the guy is banging the books against the thing. It's sort of both a symbol, but it's also somehow he's got to understand what he can do and what he can't do, and then what the kid can do to help.

I'll show this again, but really just watch. The main part I want you to see is I'll just sort of skip ahead. So watch this part here. Let's say I'll just jump. Right now he's about to look up. He looks up and makes eye contact, and then his eyes look down.

So again, he looks up, he looks up, and then a saccade, a sudden rapid eye movement down, down to his hands, up, down. Again, that's this brain OS in action. He's making one glance, small glance, at the big guy's eyes to make eye contact, to see, to get a signal, did I understand what you wanted, and did you register that joint attention.

And then he makes a prediction about what the guy's going to do, so he looks right down. He doesn't just look around randomly. He looks right down to the guy's hands to track the action that he expects to see happening. If I did the right thing to help you, then I expect you're going to put the books there.

So you can see these things happening, and we want to know what's going on inside the mind that guides all of that. So that's this sort of big scientific agenda that we're working on over the next few years, where we think some kind of human understanding of human intelligence in scientific terms could lead to all sorts of AI chaos.

In particular, suppose we could build a robot that could do what this kid and many other kids in these experiments do, to say, "Help you out around the house without having to be programmed or even really instructed, just to kind of get a sense. Oh yeah, you need a hand with that?

Sure, let me help you out." Even 18-month-olds will do that. Sometimes not very reliably or effectively. Sometimes they'll try to help and really do the opposite. But imagine if you could take the flexible understanding of humans' actions, goals, and so on, and make those reliable engineering technology. That would be very useful.

And it would also be related to, say, machines that you could actually start to talk to and trust in some ways, that shared understanding. So how are we going to do this? Well, let me spend the rest of the time talking about how we try to do this. Some of the technology that we're building both in our group and more broadly to try to make these kinds of architectures real.

And I'll talk about two or three technical ideas. Again, not in any detail. One is the idea of a probabilistic program. So this is a kind of a... think of it as a computational abstraction that we can use to capture the common sense knowledge of this core cognition. So when I say we have an intuitive understanding of physical objects and people's goals, how do I build a model of that model you have in the head?

Probabilistic programs, a little bit more technically, are... one way to understand them is as a generalization of Bayesian networks or other kinds of directed graphical models, if you know those. But where instead of defining a probability model on a graph, you define it on a program. And thereby have access to a much more expressive toolkit of knowledge representation.

So data structures, other kinds of algorithmic tools for representing knowledge. But you still have access to the ability to do probabilistic inference, like in a graphical model, but also causal inference in a directed graphical model. So for those of you who know about graphical models, that might make some sense to you.

But just more broadly, what this is, think of this as a toolkit that allows us to combine several of the best ideas, not just of the recent deep learning era, but over... if you look back over the whole scope of AI as well as cognitive science, I think there's three or four ideas, and more, but definitely like three ideas we can really put up there that have proven their worth and have risen and fallen in terms of...

each of these had ideas when the mainstream of the field thought this was totally the way to go and every other idea was obviously a waste of time. And also had its time when many people thought it was a waste of time. And these three big ideas, I would say, are first of all the idea of symbolic representation or symbolic languages for knowledge representation, probabilistic inference in generative models to capture uncertainty, ambiguity, learning from sparse data, and in their hierarchical setting, learning to learn.

And then, of course, the recent developments with neural inspired architectures for pattern recognition. Each of these things, each of these ideas, symbolic languages, probabilistic inference, and neural networks, has some distinctive strengths that are real weak points of the other approaches. So to take one example that I haven't really talked about here, people in the...

but you mentioned as an outstanding challenge for neural networks, transfer learning, or learning to take knowledge across a number of previous tasks to transfer to others. This is a real challenge and has always been a challenge in a neural net. But it's something that's addressed very naturally and very scalably in, for example, a hierarchical Bayesian model.

And if you look at some of the recent attempts, really interesting attempts within the deep learning world to try to get kinds of transfer learning and learning to learn, they're really cool. But many of them are in some ways kind of reinventing within a neural network paradigm, ideas that people, maybe just 10 or 15 years ago, developed in very sophisticated ways in, let's say, hierarchical Bayesian models.

And a lot of attempts to get sort of symbolic algorithm-like behavior in neural networks again, are really, you know, they're very small steps towards something which is a very mature technology in computer systems and programming languages. Probabilistic programs, I'll just sort of advertise mostly, are a way to combine the strengths of all of these approaches, to have knowledge representations which are as expressive as anything that anybody ever did in the symbolic paradigm, that are as flexible at dealing with uncertainty and sparse data as anything in the probabilistic paradigm, but that also can support pattern recognition tools to be able to, for example, do very fast, efficient inference in very complex scenarios.

And there's a number of -- that's the kind of conceptual framework. There's a number of actually implemented tools. I point to here on the slide a number of probabilistic programming languages which you can go explore. For example, there's one that was developed in our group a few years ago, almost 10 years ago now, called Church, which was the antecedent of some of these other languages built on a functional programming core.

So Church is a probabilistic programming language built on the lambda calculus, or really in Lisp, basically. But there are many other more modern tools, especially if you are interested in neural networks. There are tools like, for example, Pyro or ProbTorch or BayesFlow that try to combine all these ideas in a -- or for example, in here, which is a project of Vakash Mansinghka's Probabilistic Computing Group.

These are all things which are just in the very beginning stages, very, very alpha. But you can find out more about them online or by writing to their creators. And I think this is a very exciting place where the convergence of a number of different AI tools are happening.

And this will be absolutely necessary for making the kind of architecture that I'm talking about work. Another key idea, which we've been building on in our lab, and I think, again, many people are using some version of this idea, but maybe a little bit different from the way we're doing it, is -- well, the version of this idea that I like to talk about is what I call the game engine in the head.

So this is the idea that it's really what the programs are about. When I talk about probabilistic programs, I haven't said anything about what kind of programs we're using. We're just basically -- these probabilistic programming languages at their best and Church, the language that was developed by Noah Goodman and Vakash and others and Dan Roy and our group some 10 years ago, was intended to be a Turing-complete probabilistic programming language.

So any probability model that was computable or for whose inferences -- conditional inferences -- are computable, you could represent in these languages. But that leaves completely open what I'm actually going to -- what kind of program I'm going to write to model the world. And I've been very inspired in the last few years by thinking about the kinds of programs that are in modern video game engines.

So again, probably most of you are familiar with these, but if you're -- and increasingly they're playing a role in all sorts of ways in AI. But these are tools that were developed by the video game industry to allow a game designer to make a new game without having to do most of -- in some sense, most of the hard technical work from scratch, but rather to focus on the characters, the world, the story, the things that are more interesting for designing a novel game.

In particular, if we want a player to explore some new three-dimensional world, but to have them be able to interact with the world in real time and to render nice-looking graphics in real time in an interactive way as the player moves around and explores the world. Or if you want to populate the world with non-player characters that will behave in an even vaguely intelligent way.

Okay? Game engines give you tools for doing all of this without having to write all of graphics from scratch or all of physics -- the rules of physics from scratch. So what are called game physics engines, and in some sense are a set of principles, but also hacks from Newtonian mechanics and other areas of physics that allow you to simulate plausible-looking physical interactions in very complex worlds very approximately, but very fast.

There's also what's called game AI, which are basically very simple planning models. So let's say I want to have an AI in the game that is like a guard that guards a base, and a player is going to attack this base. So back in the old Atari days, like when I was a kid, the guards would just be like random things that would fire missiles kind of randomly in random directions at random times, right?

But let's say you want a guard to be a little intelligent, so to actually look around and "Oh, and I see the player," and then to actually start shooting at you and to even maybe pursue you. So that requires putting a little AI in the game, and you do that by having basically simple agent models in the game.

So what we think, and some of you might think this is crazy, and some of you might think this is a very natural idea -- I get both kinds of reactions -- what we think is that these tools of fast approximate renderers, physics engines, and sort of very simple kinds of AI planning are an interesting first approximation to the kinds of common sense knowledge representations that evolution has built into our brains.

So when we talk about the cognitive core, or how do babies start, ways in which a baby's brain isn't a blank slate, one interesting idea is that it starts with something like these tools, and then wrapped inside a framework for probabilistic inference -- that's what we mean by probabilistic programs -- that can support many activities of common sense perception and thinking.

So I'll just give you one example of what we call this intuitive physics engine. So this is work that we did in our groups, that Pete Battaglia and Jess Hamrick started this work about five years ago now, where we showed people, in some sense, and this is also an illustration of a kind of experiment that you might do.

When I keep talking about science, like I'll show you now a couple of experiments. So we would show people simple physical scenes, like these blocks world scenes, and ask them to make a number of judgments. And the model we built does basically a little bit of probabilistic inference in a game-style physics engine.

It perceives the physical state and imagines a few different possible ways the world could go over the next one or two seconds, to answer questions like "Will the stack of blocks fall?" or "If they fall, how far will they fall?" or "Which way will they fall?" or "What would happen if say, one color of blocks or one material, like the green stuff, is ten times heavier than the grey stuff?" or vice versa.

"How will that change the direction of fall?" or "Look at those red and yellow blocks, some of which look like they should be falling, but aren't." So, why? Can you infer from the fact that they're not falling that one color block is much heavier than the other? Or let me show you a sort of a slightly weird task.

It's like other behavioral experiments. Sometimes we do weird things so that we can test ways in which you use your knowledge that you didn't just learn from pattern recognition, but use it to do new kinds of tasks that you'd never seen before. So here's a task which, you know, many of you have maybe seen me talk about these things, so you might have seen this task, but probably only if you saw me give a talk around here before.

We call this the red-yellow task, and again, we'll make this one interactive. So imagine that the blocks on the table are knocked hard enough to bump, the table's bumped hard enough to knock some of the blocks onto the floor. So you tell me, "Is it more likely to be red blocks or yellow blocks?" What do you say?

Red. Okay, good. How about here? Yellow. Good. How about here? Uh-huh. Here? Here? Okay. Here? Here? Okay. So you just experience for yourself what it's like to be a subject in one of these experiments. We just did the experiment here. The data's all captured on video, sort of, right?

Okay. You can see that sometimes people were very quick, other times people were slower. Sometimes there was a lot of consensus, sometimes there was a little bit less consensus. Right? That reflects uncertainty. So again, there's a long history of studying this scientifically, that, you know, but you can see the probabilistic inference at work.

Probabilistic inference over what? Well, I would say one way to describe it is over one or a few short, low-precision simulations of the physics of these scenes. So here is what I mean by this. I'm going to show you a video of a game engine reconstruction of one of these scenes that simulates a small bump.

So here's a small bump. Here's that same scene with the big bump. Okay. Now notice that at the micro level, different things happen. But at the cognitive or macro level that matters for common sense reasoning, the same thing happened. Namely, all the yellow blocks went over onto one side of the table and few or none of the red blocks did.

So it didn't matter which of those simulations you ran in your head. You'd get the same answer in this case. This is one that's very easy and high confidence and quick. Also, you didn't have to run the simulation for very long. You only have to run it for a few time steps like that to see what's going to happen, or similarly here.

You only have to run it for a few time steps. And it doesn't have to be even very accurate. Even a fair amount of imprecision will give you basically the same answer at the level that matters for common sense. So that's the kind of thing our model does. It runs a few low-precision simulations for a few time steps.

But if you take the average of what happens there and you compare that with people's judgments, you get results like what I show you here. This scatter plot shows on the y-axis the average judgments of people. On the x-axis, the average judgments of this model. And it does a pretty good job.

It's not perfect, but the model basically captures people's graded sense of what's going on in this scene and many of these others. Okay? And it doesn't do it with any learning. But I'll come back to that in a second. It just does it by probabilistic reasoning over a game physics simulation.

Now we can use, and we have used, the same kind of technology to capture in very simple forms, really just proofs of concept at this point, the kind of common sense physical scene understanding in a child playing with blocks or other objects or in what might go on in a young child's understanding of other people's actions, what we call the intuitive psychology engine, where now the probabilistic programs are defined over these kind of very simple planning and perception programs.

And I won't go into any details. I'll just point to a couple of papers that my group played a very small role in, but we provided some models which together with some infant researchers, people working on, both of these are experiments that were done with 10 or 12 month infants, so younger than even some of the babies I showed you before, but basically like that youngest baby, the one with the cat.

Here's an example of showing simple physical scenes. These are moving objects to 12 month olds where they saw a few objects bouncing around inside a gumball machine and after some point in time, the scene gets occluded. You'll see the scene is occluded and then after another period of time, one of the objects will appear at the bottom.

And the question is, is that the object you expected to see or not? Is it expected or surprising? The standard way you study what infants know is by what's called looking time methods. Just like an adult, if I show you something that's surprising you might look longer. If you're bored, you'll look away.

So you can do that same kind of thing with infants and by measuring how long they look at a scene, you can measure whether you've shown them something surprising or not. There are literally hundreds of studies, if not more, using looking time measures to study what infants know. But only with this paper that we published a few years ago, did we have a quantitative model where we were able to show a relation between inverse probability in this case and surprise.

So things which were objectively lower probability under one of these probabilistic physics simulations across a number of different manipulations of how fast the objects were, where they were when the scene was occluded, how long the delay was, various physically relevant variables. How many objects there were of one type or another.

Infants' expectations connected with this model. Or another paper that we published, that one was done, the experiments there were done by Arno Tegelus in Luca Bonatti's lab. Here is a study that was done just recently by Sherry Liu in Liz Spelke's lab, they're at Harvard, but they're partners with us in CBMM, which was about infants' understanding of goals.

So this is more like, again, understanding of agents in intuitive psychology, where in, again, in very simple cartoon scenes, you show an infant, an agent that seems to be doing something, like an animated cartoon character, but it jumps over a wall, or it rolls up a hill, or it jumps over a gap.

And the question is, basically, how much does the agent want the goal that it seems to be trying to achieve? And what this study showed, and the models here were done by Tomer Ullman, was that infants appeared to be sensitive to the physical work done by the agent. The more work the agent did, in the sense of the integral of force applied over a path, the more the infants thought the agent wanted the goal.

We think of this as representing what we sometimes call the naive utility calculus. So the idea that there's a basic calculus of cost and benefit, you know, we take actions which are a little bit costly to achieve goal states which give us some reward. That's the most basic way, the oldest way, to think about rational, intentional action.

And it seems that even 10-month-olds understand some version of that, where the cost can be measured in physical terms. I see I'm running a little bit behind on time, and I wanted to leave some time for discussion. So I'll just go very quickly through a couple of other things, and happy to stay around at the end for discussion.

What I showed you here was the science. Where does the engineering go? So one thing you can do with this is, say, build a machine system that can look not at a little animated cartoon like these baby experiments, but a real person doing something. And again, combine physical cost and constraints of actions with some understanding of the agent's utilities.

That's the math of planning to figure out what they wanted. So look in this scene here, and see if you can judge which object the woman is reaching for. So you can see there's a grid of 4x4 objects. There's 16 objects here, and she's going to be reaching for one of them.

It's going to play in slow motion, but raise your hand when you know which one she's reaching for. So just watch and raise your hand when you know which one she wants. So most of the hands are up by now. And notice, I was looking at your hands, not here, but what happened is most of the hands were up about the time when that dashed line shot up.

That's not human data. You provided the data. This is our model. So our model is predicting, more or less, when you're able to say what her goal was. It's well before she actually touched the object. How does the model work? Again, I'll skip the details, but it does the same kind of thing that our models of those infants did.

Namely, but in this case it does it with a full-body model from robotics. So we use what's called the MuJoCo physics engine, which is a standard tool in robotics for planning physically efficient reaches of, say, a humanoid robot. And we say, we can give this planner program a goal object as input.

We can give it each of the possible goal objects as input and say, "Plan the most physically efficient action," so the one that uses the least energy to get to that object. And then we can do a Bayesian inference. This is the probabilistic inference part. The program is the MuJoCo planner.

But then we can say, "I want to do Bayesian inference to work backwards from what I observed, which was the action, to the input to that program. What goal was provided as input to the planner?" And here you can see the full array of 4x4 possible inputs, and those bars that are moving up and down, that's the Bayesian posterior probability of how likely each of those was to be the goal.

And what you can see is it converges on the right answer, at least, well, it turns out to be the ground truth right answer, but it's also the right answer according to what people think, with about the same kind of data that people took. Now you might say, "Well, okay, sure, if I just wanted to build a system that could detect what somebody was reaching for, I could generate a training data set of this sort of scene and train something up to analyze patterns of motion." But again, because the engine in your head actually does something we think more like this, it does what we call inverse planning over a physics model, it can apply to much more interesting scenes that you haven't really seen much of before.

So take the scene on the left, where again you see somebody reaching for one of a 4x4 array of objects, but what you see is a strange kind of reach. Can you see why he's doing a strange reach? Up there, it's a little small, but you can see that he's reaching over something, right?

It's actually a pane of glass, right? Do you see that? And then there's this other guy helping him, who sees what he wants and hands him the thing he wants. So how does the guy in the foreground see the other guy's goal? How does he infer his goal and know how to help him?

And then how do we look at the two of them and figure out who's trying to help who? Or that in a scene like this one here, that it's not somebody trying to help somebody, but rather the opposite. So here's a model on the left of how that might work, and we think this is the kind of model needed to tackle this sort of challenge here.

Basically, it's a model -- we take this model of planning, sort of maximal expected utility planning, which you can run backwards, but then we recursively nest these models inside each other. So we say, an agent is helping another agent. If this agent is acting, apparently, to us, seems to be maximizing an expected utility, that's a positive function of that agent's expectation about another agent's expected utility, and that's what it means to be a helper.

Hindering is sort of the opposite, if one seems to be trying to lower somebody else's utility. And we've used these same kind of models to also describe infants' understanding of helping and hindering in a range of scenes. I'll just say one last word about learning, because everybody wants to know about learning, and the key thing here, and it's definitely part of any picture of AGI, but the thought I want to leave you on is really about what learning is about.

It'll be just a few more slides, and then I'll stop, I promise. None of the models I showed you so far really did any learning. They certainly didn't do any task-specific learning. We set up a probabilistic program and then we let it do inference. Now that's not to say that we don't think people learn to do these things.

We do. But the real learning goes on when you're much younger. Everything I showed you in basic form even a one-year-old baby can do. The basic learning goes on to support these kinds of abilities. Not that there isn't learning beyond one year, but the basic way you learn to, say, solve these physics problems is what goes on in the brain of a child between zero and twelve months.

So this is just an example of some phenomena that come from the literature on infant cognitive development. These are very rough timelines. You can take pictures of this if you like. This is always a popular slide because it really is quite inspiring, I think, and I can give you lots of literature pointers, but I'm summarizing in very broad strokes with big error bars what we've learned in the field of infant cognitive development about when and how kids seem to at least come to certain understanding of basic aspects of physics.

So if you really want to study how people learn to be intelligent, a lot of what you have to study are kids at this age. You have to study what's already in their brain at zero months and what they learn and how they learn between four, six, eight, ten, twelve, and so on, and on up beyond that.

Now, effectively what that amounts to, we think, is if what you're learning is something like let's say an intuitive game physics engine to capture these basic abilities, then what we need, if we're going to try to reverse engineer that, is what you might think of as a program learning program.

If your knowledge is in the form of a program, then you have to have programs that build other programs. This is what I was talking about at the beginning about learning as building models of the world. Or ultimately, if you think what we start off with is something like a game engine that can play any game, then what you have to learn is the program of the game that you're actually playing, or the many different games that you might be playing over your life.

So think of learning as like programming the game engine in your head to fit with your experience and to fit with the possible actions that you seem like you can take. Now this is what you could call the hard problem of learning if you come to learning from, say, neural networks or other tools in machine learning.

So what makes machine, makes most of machine learning go right now, and certainly what makes neural networks so appealing, is that you can set up basically a big function approximator that can approximate many of the functions you might want to do in a certain application or task, but in a way that's end-to-end differentiable and with a meaningful cost function.

So you can have one of these nice optimization landscapes, you can compute the gradients and basically just roll downhill until you get to an optimal solution. But if you're talking about learning as something like search in the space of programs, we don't know how to do anything like that yet.

We don't know how to set this up as any kind of a nice optimization problem with any notion of smoothness or gradients. Rather what we need is, instead of learning as like rolling downhill effectively, a process which just, if you're willing to wait long enough, some simple algorithm will take care of.

Think of what we call the idea of learning as programming. There's a popular metaphor in cognitive development called the child as scientist, which emphasizes children as active theory builders and children's play as a kind of casual experimentation. But this is the algorithmic complement to that, what we call the child as coder, or around MIT we'll say the child as hacker.

But the rest of the world if you say child as hacker, they think of someone who breaks into your email and steals your credit card numbers. We all know that hacking is making your code more awesome. If your knowledge is some kind of code, or library of programs, then learning is all the ways that a child hacks on their code to make it more awesome.

More awesome can mean more accurate, but it can also mean faster, more elegant, more transportable to other applications or their tasks, more explainable to others, maybe just more entertaining. Children have all of those goals in learning. And the activities by which they make their code more awesome also correspond to many of the activities of coding.

So think about all the ways on a day to day basis you might make your code more awesome. You might have a big library of existing functions with some parameters that you can tune on a data set. That's basically what you do with backprop or stochastic gradient descent in training a deep learning system.

But think about all the ways in which you might actually modify the underlying functions. So write new code, or take old code from some other thing and map it over here, or make a whole new library of code, or refactor your code to some other basis that will work more robustly and be more extensible.

Or transpiling, or compiling, or even just commenting your code, or asking someone else for their code. Again, these are all ways that we make our code more awesome, and children's learning has analogs to all of these that we would want to understand as an engineer from an algorithmic point of view.

So in our group we've been working on various early steps towards this. And again, we don't have anything like program writing programs at the level of children's learning algorithms. But one example of something that we did in our group, which you might not have thought of being about this, but it's definitely the AI work we did that got the most attention in the last couple of years from our group.

We had this paper that was in science, it was actually on the cover of science, sort of just hit the market at the right time if you like, and it got about 100 times more publicity than anything else I've ever done, which is partly a testament to the really great work that Brendan Lake, who was the first author, did for his PhD here, but much more so just about the hunger for AI systems at the time when we published this in 2015.

And we built a machine system that, the way we described it, was doing human level concept learning for simple, very simple visual concepts, these handwritten characters in many of the world's alphabets. For those of you who know the famous MNIST dataset, the dataset of handwritten digits 0 through 10, or through 9, sorry, that drove so much good research in deep learning and pattern recognition.

It did that not because Jan Lekun, who put that together, or Jeff Hinton, who did a lot of work on deep learning with MNIST, they weren't interested fundamentally in character recognition, they saw that as a very simple testbed for developing more general ideas. And similarly, we did this work on getting machines to do a kind of one-shot learning of generative models also to develop more general ideas.

We saw this as learning very simple, little mini, probabilistic programs. In this case, what are those programs? They're the programs you use to draw a character. So ask yourself, how can you look at any one of these characters and see, in a sense, how somebody might draw it? The way we tested this in our system was this little visual Turing test, where we showed people one character in a novel alphabet and we said, "Draw another one." And then we compared nine people, like say, on the left and nine samples from our machine, say, on the right, and we said, we asked other people, "Could you tell which was the human drawing another example, or imagining another example, and which was the machine?" And people couldn't tell.

When I said, "One's on the left, one's on the right," I don't actually remember. And on different ones, you can see if you can tell. It's very hard to tell. Can you tell which is, for each one of these characters, which new set of examples were drawn by a human versus a machine?

Here's the right answer. And probably you couldn't tell. The way we did this was by assembling a simple kind of program learning program. So we basically said, when you draw a character, you're assembling strokes and substrokes with goals and subgoals that produce ink on the page. And when you see a character, you're working backwards to figure out, what was the program, the most efficient program that did that?

So you're basically inverting a probabilistic program, doing Bayesian inference to the program most likely to have generated what you saw. This is one small step, we think, towards being able to learn programs, to being able to learn something ultimately like a whole game engine program. The last thing I'll leave you with is just a pointer to sort of work in action.

So this is some work being done by a current PhD student who works partly with me, but also with Armando Solar-Lizama in CSAIL. This is Kevin Ellis. It's an example of what's now, I think, again, an emerging exciting area in AI, well beyond anything that we're doing, is combining techniques from where Armando comes from, which is the world of programming languages, not machine learning or AI, but tools from programming languages which can be used to automatically synthesize code, okay, with the machine learning toolkit, in this case a kind of Bayesian minimum description length idea, to be able to make, again, what is really one small step towards machines that can learn programs by basically trying to efficiently find the shortest, simplest program which can capture some data set.

So we think by combining these kinds of tools, in this case, let's say, from Bayesian inference over programs with a number of tools that have been developed in other areas of computer science that don't look anything or haven't been considered to be machine learning or AI, like programming languages, it's one of the many ways that going forward we're going to be able to build smarter, more human-like machines.

So just to end then, what I've tried to tell you here is, first of all, identify the ways in which human intelligence goes beyond pattern recognition to really all these activities of modeling the world, okay, to give you a sense of some of the domains where we can start to study this in common sense scene understanding, for example, or something like one-shot learning, for example, like what we were just doing there, or learning as programming the engine in your head, okay, and to give you a sense of some of the technical tools, probabilistic programs, program synthesis, game engines, for example, as well as a little bit of deep learning that, bringing together, we're starting to be able to make these things real.

Now, that's the science agenda and the reverse engineering agenda, but think about, for those of you who are interested in technology, what are the many big AI frontiers that this opens up? So the one I'm most excited about is this idea which I've highlighted here in our big research agenda.

This is the one I'm most excited about to work on for the, you know, it could be the rest of my career, honestly, but it's really what is the oldest and maybe the best dream of AI researchers of how to build a human-like intelligence system, a real AGI system.

It's the idea that Turing proposed when he proposed the Turing test, or Marvin Minsky proposed this at different times in his life, or many people have proposed this, right, which is to build a system that grows into intelligence the way a human does, that starts like a baby and learns like a child, and I've tried to show you how we're starting to be able to understand those things.

What a baby's mind starts with, how children actually learn, and looking forward, we might imagine that someday we'll be able to build machines that can do this. I think we can actually start working on this right now, and that's something that we're doing in our group. So if that kind of thing excites you, then I encourage you to work on it, maybe even with us, or if any one of these other activities of human intelligence excite you, I think taking the kind of science-based reverse engineering approach that we're doing and then trying to put that into engineering practice, this is not just a possible route, but I think it's quite possibly the most valuable route that you could work on right now to try to actually achieve at least some kind of artificial general intelligence, especially the kind of intelligence AI system that's going to live in a human world and interact with humans.

There's many kinds of AI systems that could live in worlds of data that none of us can understand or will ever live in ourselves, but if you want to build machines that can live in our world and interact with us the way we are used to interacting with other people, then I think this is a route that you should consider.

Thank you. Hi there. So, earlier in the talk you expressed some skepticism about whether or not industry would get us to understanding human-level intelligence. It seems that there's a couple of trends that favour industry. One is that industry is better than academia at accumulating resources and ploughing back into the topic, and it seems at the moment we've got a bit of brain drain going on from academia into industry, and that seems like an ongoing trend.

If you look at something like learning to fly or learning to fly into space, then it looks like the story is one of industry kind of taking over the field and going off on its own a little bit. Academics still have a role, but industry kind of dominates. Is industry going to overtake the field, do you think?

Well, that's a really good question, and it's got several good questions packed into one there. I didn't mean to say, this wasn't meant to say, "Go academia, bad industry." What I tried to say was the approaches that are currently getting the most attention in industry and that are really, because they're really the most valuable ones right now for the short term, any industry is really focused on what it can do, what are the value propositions on basically a two-year time scale at most.

If you ask, say, Google researchers to take the most prominent example, that's pretty much what they'll all tell you. Maybe things that might pay off initially in two years, but maybe take five years or more to really develop. But if you can't show that it's going to do something practical for us in two years in a way that matters for our bottom line, then it's not really worth doing.

What I'm talking about is the technologies which right now industry sees as meeting that specification. What I'm saying is right now I think that's not where the route is to something like human-like, but not the most valuable promising route to human-like kinds of AI systems. But I hope that like in the case as you said, the basic research that we're doing now will be successful enough that it will get the attention of industry when the time is right.

I hope at some point at least the engineering side will have to be done in industry, not just in academia. But you're also pointing to issues of like brain drain and other things like that. These are real issues confronting our community. I think everybody knows this and I'm sure this will come up multiple times here, which is I think we have to find ways to, even now, to combine the best of the ideas, the energy and the resources of academia and industry if we want to keep doing basically something interesting.

If we just want to redefine AI to be whatever people currently call AI but scaled up, then fine, forget about it. Or if we just want to say, let me and people like me do what we're doing at what industry would consider a snail's pace on toy problems, okay, fine.

But if we want to, if I want to take what I'm doing to the level that will really be paying off the level that industry can appreciate or just that really has technological impact on a broad scale, or I think if industry wants to take what it's doing and really build machines that are actually intelligent or machine learning that actually learns like a person, then I think we need each other now and not just at some point in the future.

So this is a general challenge for MIT and for everywhere and for Google. We just spent a few days talking to Google about exactly this issue. In fact, this was a talk I prepared partly for that purpose. We wanted to raise those issues and it's just really, I don't know what, rather I can think of some solutions to the problem of what you could call brain drain from the academic point of view or what you could call just narrowing in into certain local minima in the industry point of view.

But they will require the leadership of both academic institutions like MIT and companies like Google being creative about how they might work together in ways that are a little bit outside of their comfort zone. I hope that will start to happen including at MIT and at many other universities and at companies like Google and many others and I think we need it to happen for the health of all parties concerned.

- Okay, thank you very much. - Thanks. - I'm curious about sort of the premise that you gave that one of the big gaps missing at determining intelligence is the fact that we need to teach machines how to recognize models. And I'm curious as to what you think sort of non- goal-oriented cognitive activity comes into play there.

Things like feelings and emotions and why you don't think that might not necessarily be like the most important question. - The only reason emotions didn't appear on my slide is because there's a few reasons, but the slide is only so big. I wanted the font to be big and readable for such an important slide.

I have versions of my slide in which I do talk about that. Okay. It's not that I think feelings or emotions aren't important. I think they are important and I used to not have many insights about what to do about them, but actually partly based on some of my colleagues here at MIT, BCS, Laura Schultz and Rebecca Sachs, two of my cognitive colleagues who I work closely with, they've been starting to do research on how people understand emotions, both their own and others, and we've been starting to work with them on computational models.

So that's actually something I'm actively interested in and even working on. But I would say, and again for those of you who study emotion or know about this, actually you're going to have Lisa coming in, right? She's going to basically say a version of the same thing, I think.

The deepest way to understand, she's one of the world's experts on this, the deepest way to understand emotion is very much based on our mental models of ourselves, of the situation we're in, and of other people. Think about, for example, all of the different I mean, if you think about, I mean again, Lisa will talk all about this, but if you think about emotion as just a very small set of what are sometimes called basic emotions, like being happy or angry or sad or those are a small number of them, right?

There's usually only a few, right? You might not say, you might see that as somehow like very basic things that are opposed to some kind of cognitive activity. But think about all the different words we have for emotion, right? For example, think about a famous cognitive emotion like regret.

What does it mean to feel regret or frustration, right? To know both for yourself when you're not just feeling kind of down or negative, but you're feeling regret, that means something like I have to feel like there's a situation that came out differently from how I hoped, and I realized I could have done something differently, right?

So that means you have to be able to understand, you have to have a model, you have to be able to do a kind of counterfactual reasoning and to think oh, if only I had acted a different way, then I can predict that the world would have come out differently, and that's the situation I wanted, but instead it came out this other way, right?

Or think about frustration again, that requires something like understanding, okay, I've tried a bunch of times, I thought this would work, but it doesn't seem to be working, maybe I'm ready to give up. Those are all, those are very important human emotions. We have to understand, to understand ourselves, we need that, to understand other people, to understand communication, but those are all filtered through the kinds of models of action that I was, just the ones I was talking about here with these, say, cost-benefit analyses of action.

So what I'm, so I'm just trying to say I think this is very basic stuff, but that will be the basis for building I think better engineering style models of the full spectrum of human emotion beyond just like, well, I'm feeling good or bad or scared, okay? And if, I think when you see Lisa, she will, in her own way, say something very similar.

Interesting. Thanks. Yeah. Thanks, Josh, for your nice talk. So all is about human cognition and try to build a model to mimic those cognition, but you don't, how much could help you to understand how the circuit implement those things? You mean like the circuits in the brain? Yeah. Is that what you work on by any chance?

Sorry, what? Is that what you work on by any chance? Yeah. Yeah, I know. I'm kidding. Yeah. So in the Center for Brains, Minds, and Machines, as well as in Brain and Cognitive Science, yeah, I have a number of colleagues who study the actual hardware basis of this stuff in the brain, and that includes like the large-scale architecture of the brain, say like what Nancy Kamisher, Rebecca Sachs study with functional brain imaging, or the more detailed circuitry, which usually requires recording from, say, non-human brains, right, at the level of individual neurons and connections between neurons.

All right. So I'm very interested in those things, although it's not mostly what I work on, right? But I would say, you know, again, like in many other areas of science, certainly in neuroscience, the kind of work I'm talking about here in a sort of classic reductionist program sets the target for what we might look for.

Like if I just want to go, I mean, I would, what I would assert, right, or my working conjunction, is that I would say, "Okay, so I'm going to do this, right, or my working conjecture is that if you do the kind of work that I'm talking about here, it gives you the right targets, or it gives you a candidate set of targets to look for, what are the neural circuits computing, right?

Whereas if you just go in and just say, start poking around in the brain, or have some idea that what you're going to try to do is find the neural circuits which underlie behavior, without a sense of the computations needed to produce those behaviors, I think it's going to be very difficult to know what to look for, and to know when you've found even viable answers.

So I think that's the standard kind of reductionist program, but it's not, I also think it's not one that is divorced from the study of neural circuits. It's also one, if you look at the broad picture of reverse engineering, it's one where neural circuits and understanding the circuits in the brain play an absolutely critical role, okay?

I would say, when you look at the brain at the hardware level as an engineer, I'm mostly looking at the software level, right? But when you look at the hardware level, there are some remarkable properties. One remarkable property again is how much parallelism there is, and in many ways how fast the computations are, okay?

Neurons are slow, but the computations of intelligence are very fast. So how do we get elements that are in some sense quite slow in their time constant to produce such intelligent behavior so quickly? That's a great mystery, and I think if we understood that, it would have payoff for building all sorts of basically application-embedded circuits, okay?

But also maybe most important is the power consumption, and again, many people have noted this, right? If you look at the power consumption, the power that the brain consumes, like what did I eat today, okay? Almost nothing. My daughter, who's again, she's doing an internship here, she literally yesterday, all she ate was a burrito, and yet she wrote 300 lines of code for her internship project on a really cool computational linguistics project.

So somehow she turned a burrito into a model of child language acquisition, okay? But how did she do that, or how do any of us do this, right? Where if you look at the power that we consume when we simulate even a very, very small chunk of cortex on our conventional hardware, or we do any kind of machine learning thing, we have systems which are very, very, very, very far from the power of the human brain computationally, but in terms of physical energy consumed, way past what any individual brain is doing.

So how do we get circuitry of any sort, biological or just any physical circuits, to be as smart as we are with as little energy as we are? This is a huge problem for basically every area of engineering, right? If you want to have any kind of robot, the power consumption is a key bottleneck.

Same for self-driving cars. If we want to build AI without contributing to global warming and climate change, let alone use AI to solve climate change, we really need to address these issues, and the brain is a huge guide there. I think there are some people who are really starting to think about this.

How can we, say, for example, build somehow brain-inspired computers which are very, very low power, but maybe only approximate? So I'm thinking here of Joe Bates. I don't know if any of you know Joe. He's been around MIT and other places for quite a while. Can I tell them about your company?

So Joe has a startup in Kendall Square called Singular Computing, and they have some very interesting ideas, including some actual implemented technology for low-power, approximate computing in a sort of a brain-like way that might lead to possibly even, like, the ability to build something -- this is Joe's dream -- to build something that's about the size of this table but that has a billion cores, a billion cores, and runs on a reasonable kind of power consumption.

I would love to have such a machine. If anybody wants to help Joe build it, I think he'd love to talk to you. But it's one of a number of ideas. I mean, Google X, people are working on similar things. Probably most of the major chip companies are also inspired by this idea.

And I think, even if you didn't think you were interested in the brain, if you want to build the kind of AI we're talking about and run it on physical hardware of any sort, and understanding how the brain's circuits compute what they do, what I'm talking about, with as little power as they do, I don't know any better place to look.

It seems like a lot of the improvements in AI have been driven by increasing computational power. How far would you say -- You mean like GPUs or CPUs? How far would you say we are from hardware that could run a general artificial intelligence? Of the kind that I'm talking about?

Yeah, I don't know. I'll start with a billion cores and then we'll see. I mean, I think we're -- I think there's no way to answer that question in a way that's software independent. I don't know how to do that. But I think that it's -- I don't know.

When you say how far are we, you mean how far am I with the resources I have right now? How far am I if Google decides to put all of its resources at my disposal like they might if I were working at DeepMind? I don't know the answer to that question.

I think what we can say is this. Individual neurons -- I mean, again, this goes back to another reason to study neural circuits. If you look at what we currently call neural networks in the AI side, the model of a neuron is this very, very simple thing. Individual neurons are not only much more complex, but have a lot more computational power.

It's not clear how they use it or whether they use it, but I think it's just as likely that a neuron is something like a relu is that a neuron is something like a computer. Like, one neuron in your brain is more like a CPU node, maybe. And thus, the 10 billion or trillion -- the large number of neurons in your brain -- I think it's like 10 billion cortical pyramidal neurons or something -- might be like 10 billion cores.

That's at least as plausible, I think, to me as any other estimate. I think we're definitely on the underside with very big error bars. I completely agree that -- or if this is what you might be suggesting, and going back to my answer to your question, I don't think we're going to get to what I'm talking about or anything like a real brain scale without major innovations on the hardware side.

It's interesting that what drove those innovations that support current AI was mostly not AI. It was the video game industry. When I point to the video game engine in your head, that's a similar thing that was driven by the video game industry on the software side. I think we should all play as many video games as we can and contribute to the growth of the video game industry.

You can see this in very -- there are companies out there for example, there's a company called Improbable, which is a London company, a London-based startup, a pretty sizable startup at this point, which is building something that they call Spatial OS, which is -- it's not a hardware idea, but it's a kind of software idea for very, very big distributed computing environments to run much, much more complex, realistic simulations of the world for much more interesting, immersive, permanent video games.

I think that's one thing that might -- hopefully that will lead to more fun, new kinds of games. But that's one example of where we might look to that industry to drive some of the -- just computer systems, really hardware and software systems that will take our game to the next level.

Josh, understanding on the algorithmic level or cognitive level is just to understanding the learning, the meaning of learning would be how to predict. But on the circuit level it's different. But at the what level? On the circuit level. Well, of course it's different, right? But already I think you made a mistake there, honestly.

Like, you said the cognitive level is learning how to predict, but I'm not sure what you mean by that. There's many things you could mean, and what our cognitive science is about is learning which of those versions -- like, I don't think it's learning how to predict. I think it's learning what you need to know to plan actions and to -- you know, all those things.

Like, it's not just about predicting. Because there are things we can imagine that you would never predict because they would never happen unless we somehow make the world different. So generalization, sorry, not predicting. When your model could generalize. But especially in the transfer learning that you are interested in, a few hundred neurons in prefrontal cortex, they could generalize a lot.

But not kind of a Bayesian model could do that. You said, but a Bayesian model won't do that? Or they don't do it the way a Bayesian model does? For sure, because that's in the abstract level. Well, I mean, how do you really know? I mean, what does it mean to say that some neurons do it?

So maybe another way to put this is to say, look, we have a certain math that we use to capture these -- you could call it abstract, or I call it software level abstractions, right? I mean, all engineering is based on some kind of abstraction. But you might have a circuit level abstraction, a certain kind of hardware level that you're interested in describing the brain at.

And I'm mostly working at or starting from a more software level of abstraction. They're all abstractions. We're not talking about molecules here. We're talking about some abstract notion of maybe a circuit, or of a program. Now it's a really interesting question. If I look at some circuits, how do I know what program they're implementing?

If I look at the circuits in this machine, could I tell what program they're implementing? Well, maybe, but certainly it would be a lot easier if I knew something about what programs they might be implementing before I start to look at the circuitry. If I just looked at the circuitry without knowing what a program was, or what programs the thing might be doing, or what kind of programming components would be mappable to circuits in different ways, I don't even know how I'd begin to answer that question.

So I think we've made some progress at understanding what neurons are doing in certain low-level parts of sensory system and certain parts of the motor system, like primary motor cortex. Basically, the parts of the neurons that are closest to the inputs and outputs of the brain, where we don't--you could say we don't need the kind of software abstractions that I'm talking about, or where we sort of agree on what those things already are, so we can make enough progress on knowing what to look for and how to know when we've found it.

But if you want to talk about flexible planning, things that are more like cognition, that go on in prefrontal cortex, at this point, I don't think that just by recording from those neurons, we're going to be able to answer those questions in a meaningful engineering way. A way that any engineer, software, hardware, whatever, could really say, "Yeah, okay, I get it.

I get those insights in a way that I can engineer with." And that's what my goal is. So that's my goal to do at the software level, the hardware level, or the entire systems level, connecting them. And I think that we can do that by taking what we're doing and bringing it to contact with people studying neural circuits.

But I don't think you can leave this level out and just go straight to the neural circuits. And I think the more progress we make, the more we can help people who are studying at the neural circuit level. And they can help us address these other engineering questions that we don't really have access to, like the power issue or the speed issue.

Thanks. That was great. So maybe we give Jack a big hand.

MIT AGI: Building machines that see, learn, and think like people (Josh Tenenbaum)

Chapters

Transcript