MIT AGI: Cognitive Architecture (Nate Derbinsky)

00:00:00.000 | So today we have Nate Derbinski.

00:00:02.720 | He's a professor at Northeastern University

00:00:05.140 | working on various aspects of computational agents

00:00:08.720 | that exhibit human level intelligence.

00:00:11.520 | Please give Nate a warm welcome.

00:00:13.780 | (audience applauding)

00:00:16.940 | - Thanks a lot and thanks for having me here.

00:00:20.800 | So the title that was on the page was Cognitive Modeling.

00:00:25.680 | I'll kind of get there, but I wanted to put it in context.

00:00:28.200 | So the bigger theme here is I wanna talk about

00:00:31.160 | what's called cognitive architecture.

00:00:33.000 | And if you've never heard about that before, that's great.

00:00:35.800 | And I wanted to contextualize that as

00:00:37.760 | how is that one approach to get us to AGI?

00:00:41.900 | I'm gonna say what my view of AGI is

00:00:46.400 | and put up a whole bunch of TV and movie characters

00:00:50.120 | that I grew up with that inspire me.

00:00:52.540 | That'll lead us into

00:00:53.380 | what is this thing called cognitive architecture?

00:00:55.320 | It's a whole research field that crosses neuroscience,

00:00:58.760 | psychology, cognitive science, and all the way into AI.

00:01:02.360 | So I'll try to give you kind of the historical,

00:01:04.340 | big picture view of it,

00:01:05.920 | what some of the actual systems are out there

00:01:08.520 | that might be of interest to you.

00:01:10.000 | And then we'll kind of zoom in on one of them

00:01:11.820 | that I've done a good amount of work with called SOAR.

00:01:14.640 | And what I'll try to do is tell a story,

00:01:17.600 | a research story of how we started

00:01:20.220 | with kind of a core research question.

00:01:22.700 | We'd look to how humans operate,

00:01:27.040 | understood that phenomenon,

00:01:28.600 | and then took it and saw really interesting results from it.

00:01:31.400 | And so at the end, if this field is of interest,

00:01:34.240 | there's a few pointers for you to go read more

00:01:37.100 | and go experience more of cognitive architecture.

00:01:39.540 | So just rough definition of AGI, given this is an AGI class.

00:01:45.600 | Depending the direction that you're coming from,

00:01:49.480 | it might be kind of understanding intelligence

00:01:51.720 | or it might be developing intelligence systems

00:01:54.680 | that are operating at the level of human level intelligence.

00:01:57.520 | The typical differences between this

00:02:00.680 | and other sorts of maybe AI machine learning systems,

00:02:03.760 | we want systems that are gonna persist

00:02:06.000 | for a long period of time.

00:02:07.960 | We want them robust to different conditions.

00:02:10.760 | We want them learning over time.

00:02:12.560 | And here's the crux of it, working on different tasks.

00:02:16.060 | And in a lot of cases,

00:02:17.520 | tasks they didn't know were coming ahead of time.

00:02:21.560 | I got into this because I clearly watched too much TV

00:02:25.160 | and too many movies.

00:02:26.720 | And then I looked back at this and I realized,

00:02:28.400 | I think I'm covering 70s, 80s, 90s,

00:02:32.640 | noughts, I guess it is, and today.

00:02:35.220 | And so this is what I wanted out of AI

00:02:38.760 | and this is what I wanted to work with.

00:02:41.280 | And then there's the reality that we have today.

00:02:44.800 | So instead of, so who's watched "Knight Rider" for instance?

00:02:51.000 | I don't think that exists yet,

00:02:53.860 | but maybe we're getting there.

00:02:56.560 | And in particular, for fun, during the Amazon sale day,

00:03:01.260 | I got myself an Alexa

00:03:03.200 | and I could just see myself at some point saying,

00:03:05.280 | hey Alexa, please write me an R-sync script

00:03:07.880 | to sync my class.

00:03:10.560 | And if you have an Alexa,

00:03:11.720 | you probably know the following phrase.

00:03:13.400 | This just always hurts me inside, which is,

00:03:16.240 | sorry, I don't know that one.

00:03:19.520 | Which is okay, right?

00:03:20.440 | That's, a lot of people have no idea what I'm asking,

00:03:23.440 | let alone how to do that.

00:03:24.820 | So what I want Alexa to respond with after that is,

00:03:28.720 | do you have time to teach me?

00:03:30.160 | And to provide some sort of interface

00:03:33.300 | by which back and forth we can kind of talk through this.

00:03:36.280 | We aren't there yet, to say the least,

00:03:39.440 | but I'll talk later about some work on a system called Rosie

00:03:44.440 | that's working in that direction.

00:03:46.460 | We're starting to see some ideas

00:03:48.480 | about being able to teach systems how to work.

00:03:51.200 | So folks who are in this field,

00:03:54.600 | I think generally fall into these three categories.

00:03:58.760 | They're just curious.

00:03:59.880 | They want to learn new things,

00:04:01.560 | generate knowledge, work on hard problems.

00:04:04.160 | Great.

00:04:05.120 | I think there are folks who are in

00:04:06.960 | kind of that middle cognitive modeling realm.

00:04:09.800 | And so I'll use this term a lot.

00:04:11.960 | It's really understanding how humans think,

00:04:15.200 | how humans operate, human intelligence at multiple levels.

00:04:18.880 | And if you can do that,

00:04:20.680 | one, there's just knowledge in and of itself

00:04:22.920 | of how we operate,

00:04:23.880 | but there's a lot of really important applications

00:04:26.020 | that you can think of.

00:04:27.000 | If we were able to not only understand,

00:04:30.320 | but predict how humans would respond,

00:04:32.880 | react in various tasks.

00:04:35.280 | Medicine is an easy one.

00:04:37.960 | There's some work in HCI or HRI,

00:04:40.720 | I'll get to later,

00:04:42.620 | where if you can predict how humans would respond to a task,

00:04:46.080 | you can iterate tightly and develop better interfaces.

00:04:50.200 | It's already being used in the realm of simulation

00:04:52.520 | and in defense industries.

00:04:55.200 | I happen to fall into the latter group,

00:04:57.880 | or the bottom group,

00:04:58.840 | which is systems development,

00:05:00.640 | which is to say just the desire to build systems

00:05:02.960 | for various tasks that are working on tasks

00:05:05.600 | that kind of current AI machine learning can't operate on.

00:05:09.840 | And I think when you're working at this level

00:05:12.620 | or on any system that nobody's really achieved before,

00:05:16.660 | what do you do?

00:05:17.500 | You kind of look to the examples that you have,

00:05:19.300 | which in this case that we know of,

00:05:21.940 | it's just humans, right?

00:05:23.380 | Irrespective of your motivation,

00:05:27.880 | when you have kind of an intent

00:05:31.500 | that you want to achieve in your research,

00:05:33.280 | you kind of let that drive your approach.

00:05:35.740 | And so I often show my AI students this.

00:05:39.520 | The Turing test you might've heard of,

00:05:41.660 | or variants of it that have come before,

00:05:44.820 | these were folks who were trying to create systems

00:05:47.020 | that acted in a certain way,

00:05:48.340 | that acted intelligently.

00:05:49.900 | And the kind of line that they drew,

00:05:51.980 | the benchmark that they used was to say,

00:05:54.200 | let's make systems that operate like humans do.

00:05:57.000 | Cognitive modelers will fit up into this top point here

00:06:01.380 | to say it's not enough to act that way,

00:06:04.420 | but by some definition of thinking,

00:06:07.660 | we want the system to do what humans do,

00:06:11.080 | or at least be able to make predictions about it.

00:06:12.980 | So that might be things like,

00:06:14.560 | what errors would the human make on this task?

00:06:17.020 | Or how long would it take them to perform this task?

00:06:19.800 | Or what emotion would be produced in this task?

00:06:22.780 | There are folks who are still thinking about

00:06:26.240 | how the computer is operating,

00:06:28.100 | but trying to apply kind of rational rules to it.

00:06:32.600 | So a logician, for instance, would say,

00:06:35.440 | if you have A and you have B,

00:06:37.860 | A gives you B, B gives you C,

00:06:39.160 | A should definitely give you C.

00:06:41.320 | That's just what's rational.

00:06:42.580 | And so there are folks operating in that direction.

00:06:44.920 | And then if you go to intro AI class

00:06:47.780 | anywhere around the country,

00:06:48.700 | particularly Berkeley,

00:06:50.180 | because they have graphics designers

00:06:51.820 | that I get to steal from,

00:06:54.180 | the benchmark would be what the system produces

00:06:57.520 | in terms of action,

00:06:59.080 | and the benchmark is some sort of optimal rational bound.

00:07:04.940 | Irrespective of where you work in this space,

00:07:07.240 | there's kind of a common output that arrives

00:07:11.960 | when you research these areas,

00:07:13.480 | which is you can learn individual bits and pieces,

00:07:17.760 | and it can be hard to bring them together

00:07:20.560 | to build a system that either predicts

00:07:22.660 | or acts on different tasks.

00:07:24.960 | So this is part of the transfer learning problem,

00:07:27.040 | but it's also part of having distinct theories

00:07:30.840 | that are hard to combine together.

00:07:32.640 | So I'm gonna give an example

00:07:33.480 | that comes out of cognitive modeling,

00:07:35.860 | or perhaps three examples.

00:07:37.220 | So if you were in a HCI class

00:07:40.020 | or some intro psychology classes,

00:07:42.580 | one of the first things you learn about is Fitts' Law,

00:07:45.320 | which provides you the ability

00:07:46.900 | to predict the difficulty level

00:07:50.960 | of basically human pointing

00:07:53.120 | from where they start to a particular place.

00:07:55.760 | And it turns out that you can learn some parameters

00:07:58.720 | and model this based upon just the distance

00:08:02.020 | from where you are to the target

00:08:04.080 | and the size of the target.

00:08:06.040 | So both moving a long distance will take a while,

00:08:08.500 | but also if you're aiming for a very small point,

00:08:11.200 | that can take longer than if there's a large area

00:08:13.320 | that you just kind of have to get yourself to.

00:08:15.640 | And so this is held true for many humans.

00:08:19.100 | So let's say we've learned this,

00:08:21.080 | and then we move on to the next task,

00:08:22.880 | and we learn about what's called the power law of practice,

00:08:26.640 | which has been shown true in a number of different tasks.

00:08:30.680 | What I'm showing here is one of them,

00:08:32.000 | where you're going to draw a line

00:08:34.560 | through sequential set of circles here,

00:08:36.760 | starting at one, going to two, and so forth,

00:08:39.680 | not making a mistake, or at least not trying to,

00:08:42.520 | and try to do this as fast as possible.

00:08:44.600 | And so for a particular person,

00:08:47.440 | we would fit the A, B, and C parameters,

00:08:49.420 | and we'd see a power law.

00:08:50.740 | So as you perform this task more,

00:08:53.280 | you're gonna see a decrease in the amount

00:08:56.320 | of reaction time required to complete the task.

00:08:59.320 | Great, we've learned two things about humans.

00:09:02.060 | Let's add some more in.

00:09:03.220 | So for those who might have done

00:09:05.040 | some reinforcement learning,

00:09:05.940 | TD learning is one of those approaches,

00:09:08.260 | temporal difference learning,

00:09:09.860 | that's had some evidence of similar sorts of processes

00:09:14.860 | in the dopamine centers of the brain.

00:09:16.940 | And it basically says in a sequential learning task,

00:09:19.900 | you perform the task, you get some sort of reward.

00:09:23.140 | How are you going to kind of update your representation

00:09:25.780 | of what to do in the future,

00:09:26.800 | such as to maximize expectation of future reward?

00:09:29.940 | And there are various models of how that changes over time,

00:09:33.220 | and you can build up functions that allow you

00:09:35.460 | to perform better and better and better

00:09:36.820 | given trial and error.

00:09:38.620 | Great, so we've learned three interesting models here

00:09:43.580 | that hold true over multiple people, multiple tasks.

00:09:47.500 | And so my question is, if we take these together

00:09:50.260 | and add them together, how do we start to understand

00:09:55.260 | a task as quote unquote simple as chess?

00:09:58.340 | Which is to say, we could ask questions,

00:10:00.780 | how long would it take for a person to play?

00:10:04.740 | What mistakes would they make?

00:10:06.480 | After they played a few games,

00:10:08.380 | how would they adapt themselves?

00:10:10.460 | Or if we want to develop system

00:10:12.180 | that ended up being good at chess,

00:10:14.340 | or at least learning to become better at chess.

00:10:16.980 | My question is, if you could,

00:10:19.060 | there doesn't seem to be a clear way

00:10:21.460 | to take these very, very individual theories

00:10:24.460 | and kind of smash them together

00:10:26.040 | and get a reasonable answer of how to play chess,

00:10:29.340 | or how do humans play chess?

00:10:30.900 | And so, gentlemen in this slide is Alan Newell,

00:10:37.300 | one of the founders of AI,

00:10:40.540 | did incredible work in psychology and other fields.

00:10:43.740 | He gave a series of lectures at Harvard in 1987,

00:10:47.820 | and they were published in 1990

00:10:49.460 | called the Unified Theories of Cognition.

00:10:51.660 | And his argument to the psychology community at that point

00:10:54.660 | was the argument on the prior slide.

00:10:56.980 | They had many individual studies, many individual results.

00:11:00.940 | And so the question was, how do you bring them together

00:11:03.260 | to gain this overall theory?

00:11:05.020 | How do you make forward progress?

00:11:06.980 | And so his proposal was Unified Theories of Cognition,

00:11:10.420 | which became known as cognitive architecture.

00:11:13.980 | Which is to say, to bring together your core assumptions,

00:11:18.980 | your core beliefs of what are the fixed mechanisms

00:11:22.500 | and processes that intelligent agents

00:11:25.820 | would use across tasks.

00:11:27.740 | So the representations, the learning mechanisms,

00:11:30.660 | the memory systems, bring them together,

00:11:34.500 | implement them in a theory, and use that across tasks.

00:11:38.580 | And the core idea is that when you actually have

00:11:41.220 | to implement this and see how it's going to work

00:11:43.300 | across different tasks, the interconnections

00:11:45.900 | between these different processes and representations

00:11:49.500 | would add constraint.

00:11:51.420 | And over time, the constraints would start limiting

00:11:54.220 | the design space of what is necessary

00:11:56.940 | and what is possible in terms of building

00:11:58.420 | intelligent systems.

00:11:59.620 | And so the overall goal from there was to understand

00:12:02.500 | and exhibit human-level intelligence

00:12:04.640 | using these cognitive architectures.

00:12:06.440 | A natural question to ask is, okay, so we've gone from

00:12:13.420 | a methodology of science that we understand

00:12:17.180 | how to operate in.

00:12:19.180 | We make a hypothesis, we construct a study,

00:12:22.860 | we gather our data, we evaluate that data,

00:12:25.700 | and we falsify or we do not falsify the original hypothesis.

00:12:29.180 | And we can do that over and over again,

00:12:30.860 | and we know that we're making forward progress

00:12:32.620 | scientifically.

00:12:33.700 | If I've now taken that model and changed it into,

00:12:36.740 | I have a piece of software,

00:12:38.940 | and it's representing my theories.

00:12:41.820 | And to some extent, I can configure that software

00:12:43.900 | in different ways to work on different tasks.

00:12:46.260 | How do I know that I'm making progress?

00:12:48.700 | And so there's a form of science called lactosean,

00:12:53.700 | and it's kind of shown pictorially here

00:12:56.180 | where you start with your core of what your beliefs are

00:13:00.940 | about where you're at, what is necessary

00:13:04.660 | for achieving the goal that you have.

00:13:07.100 | And around that, you'll have kind of ephemeral hypotheses

00:13:10.460 | and assumptions that over time may grow and shrink.

00:13:13.700 | And so you're trying out different things,

00:13:15.040 | trying out different things.

00:13:16.500 | And if an assumption is around there long enough,

00:13:19.100 | it becomes part of that core.

00:13:20.980 | And so as you work on more tasks and learn more,

00:13:24.660 | either by your work or by data coming in

00:13:26.660 | from someone else, the core is growing larger and larger.

00:13:30.500 | You've got more constraints and you've made more progress.

00:13:34.020 | And so what I wanted to look at were in this community,

00:13:38.060 | what are some of the core assumptions

00:13:39.900 | that are driving forward scientific progress?

00:13:42.140 | So one of them actually came out of those lectures

00:13:46.480 | that are referred to as Newell's time scales

00:13:48.340 | of human action.

00:13:49.300 | And so off on the left,

00:13:53.080 | the left two columns are both time units,

00:13:55.000 | just expressed somewhat differently.

00:13:57.220 | Second from the left being maybe more useful

00:13:59.240 | to a lot of us in understanding daily life.

00:14:03.160 | One step over from there would be kind of

00:14:05.140 | at what level processes are occurring.

00:14:07.260 | So the lowest three are down at kind of the substrate,

00:14:11.060 | the neuronal level.

00:14:12.520 | We're building up to deliberate tasks that occur

00:14:15.380 | in the brain and tasks that are operating

00:14:17.680 | on the order of 10 seconds.

00:14:19.360 | Some of these might occur in the psychology laboratory,

00:14:21.720 | but probably a step up to minutes and hours.

00:14:26.440 | And then above that really becomes interactions

00:14:28.500 | between agents over time.

00:14:29.900 | And so if we start with that,

00:14:32.160 | the things to take away is that regular,

00:14:35.600 | the hypothesis is that regularities will occur

00:14:38.760 | at these different time scales and that they're useful.

00:14:41.880 | And so those who operate at that lowest time scale

00:14:45.000 | might be considering neuroscience, cognitive neuroscience.

00:14:48.920 | When you shift up to the next couple levels,

00:14:51.040 | what we would think about in terms of the areas of science

00:14:53.800 | that deal with that would be psychology and cognitive

00:14:55.840 | science, and then we shift up a level and we're talking

00:14:58.280 | about sociology and economics and the interplay

00:15:01.600 | between agents over time.

00:15:04.720 | And so what we'll find with cognitive architecture

00:15:08.360 | is that most of them will tend to sit at the deliberate act.

00:15:11.760 | We're trying to take knowledge of a situation

00:15:14.480 | and make a single decision.

00:15:16.600 | And then sequences of decisions over time

00:15:19.040 | will build to tasks and tasks over time

00:15:21.520 | will build to more interesting phenomenon.

00:15:24.000 | I'm actually going to show that that isn't strictly true,

00:15:26.320 | that there are folks working in this field

00:15:27.800 | that actually do operate one level below.

00:15:30.560 | Some other assumptions.

00:15:33.640 | So this is Herb Simon receiving the Nobel Prize in economics

00:15:38.240 | and part of what he received that award for

00:15:41.480 | was an idea of bounded rationality.

00:15:44.600 | So in various fields, we tend to model humans as rational.

00:15:49.240 | And his argument was, let's consider that human beings

00:15:54.240 | are operating under various kinds of constraints.

00:15:58.600 | And so to model the rationality with respect to

00:16:01.480 | and bounded by how complex the problem is

00:16:04.320 | that they're working on, how big is that search space

00:16:06.480 | that they have to conquer.

00:16:08.800 | Cognitive limitations.

00:16:10.320 | So speed of operations, amount of memory,

00:16:15.160 | short term as well as long term,

00:16:16.680 | as well as other aspects of our computing infrastructure

00:16:20.000 | that are gonna keep us from being able to

00:16:22.000 | arbitrarily solve complex problems,

00:16:24.840 | as well as how much time is available to make that decision.

00:16:28.120 | And so this is actually a phrase that came out of his speech

00:16:32.400 | when he received the Nobel Prize.

00:16:34.080 | Decision makers can satisfy us

00:16:36.680 | either by finding optimum solutions for a simplified world,

00:16:40.040 | which is to say, take your big problem,

00:16:42.120 | simplify it in some way, and then solve that.

00:16:44.680 | Or by finding satisfactory solutions

00:16:47.480 | for a more realistic world.

00:16:48.920 | Take the world in all its complexity,

00:16:50.720 | take the problem in all its complexity,

00:16:52.600 | and try to find something that works.

00:16:55.200 | Neither approach in general dominates the other,

00:16:57.080 | and both have continued to coexist.

00:16:59.120 | And so what you're actually going to see

00:17:01.320 | throughout the cognitive architecture community

00:17:03.240 | is this understanding that some problems

00:17:07.160 | you're not gonna be able to get an optimal solution to

00:17:09.760 | if you consider, for instance,

00:17:11.960 | bounded amount of computation, bounded time,

00:17:14.640 | the need to be reactive to a changing environment,

00:17:17.480 | these sorts of issues.

00:17:18.920 | And so in some sense, we can decompose problems

00:17:22.040 | that come up over and over again into simpler problems,

00:17:25.080 | solve those near optimally or optimally,

00:17:28.920 | fix those in, optimize those,

00:17:31.800 | but more general problems we might have to satisfy some.

00:17:34.600 | There's also the idea of the simple system hypothesis.

00:17:38.840 | So this is Alan Newell and Herb Simon there

00:17:43.640 | considering how a computer could play the game of chess.

00:17:46.600 | So the physical symbol system

00:17:49.600 | talks about the idea of taking something,

00:17:51.800 | some signal abstractly referred to as symbol,

00:17:55.720 | combining them in some ways to form expressions,

00:17:58.080 | and then having operations that produce new expressions.

00:18:02.440 | A weak interpretation of the idea that symbol systems

00:18:07.360 | are necessary and sufficient for intelligent systems,

00:18:09.800 | a very weak way of talking about it is the claim

00:18:12.520 | that there's nothing unique

00:18:14.760 | about the neuronal infrastructure that we have,

00:18:17.240 | but if we got the software right,

00:18:20.360 | we could implement it in the bits, bytes,

00:18:23.080 | RAM, and processor that make up modern computers.

00:18:25.640 | That's kind of the weakest way to look at this,

00:18:27.840 | that we can do it with silicon and not carbon.

00:18:32.000 | Stronger way that this used to be looked at

00:18:36.720 | as more of a logical standpoint,

00:18:38.560 | which is to say if we can encode rules of logic,

00:18:42.600 | these tend to line up if we think intuitively

00:18:45.240 | of planning and problem solving.

00:18:47.920 | And if we can just get that right

00:18:49.480 | and get enough facts in there and enough rules in there

00:18:52.360 | that somehow intelligence,

00:18:54.240 | well, that's what we need for intelligence,

00:18:55.920 | and eventually we can get to the point of intelligence,

00:18:58.440 | and that's what you need for intelligence.

00:19:01.160 | And that was a starting point that lasted for a while.

00:19:04.800 | I think by now most folks in this field

00:19:08.280 | would agree that that's necessary

00:19:11.440 | to be able to operate logically,

00:19:13.200 | but that there are going to be representations

00:19:15.320 | and processes that'll benefit

00:19:17.640 | from non-symbolic representation,

00:19:19.160 | so particularly perceptual processing,

00:19:21.600 | visual, auditory, and processing things

00:19:24.600 | in a more kind of standard machine learning sort of way,

00:19:27.560 | as well as kind of taking advantage

00:19:31.960 | of statistical representations.

00:19:34.720 | So we're getting closer to actually looking

00:19:39.320 | at cognitive architectures.

00:19:41.160 | I did want to go back to the idea

00:19:42.960 | that different researchers are coming

00:19:45.240 | with different research foci,

00:19:46.840 | and we'll start off with kind of the lowest level

00:19:51.640 | in understanding biological modeling.

00:19:54.000 | So Lieber and Spahn both try to model

00:19:57.160 | different degrees of low-level details,

00:20:00.480 | parameters, firing rates,

00:20:03.160 | connectivities between different kind of levels

00:20:07.080 | of neuronal representations.

00:20:09.600 | They build that up,

00:20:10.440 | and then they try to build tasks above that layer,

00:20:13.280 | but always being very cautious about being

00:20:16.720 | true to human biological processes.

00:20:23.040 | And a layer above there would be psychological modeling,

00:20:25.800 | which is to say trying to build systems

00:20:29.040 | that are true in some sense to areas of the brain,

00:20:32.600 | interactions in the brain,

00:20:33.760 | and being able to predict errors that we made,

00:20:37.720 | timing that we produced by the human mind.

00:20:40.400 | And so there I'll talk a little bit about ACT-R.

00:20:43.160 | This final level down here,

00:20:46.040 | these are systems that are focused

00:20:47.720 | mainly on producing functional systems

00:20:50.400 | that exhibit really cool artifacts

00:20:55.400 | and solve really cool problems.

00:20:57.080 | And so I'll spend most of the time

00:20:58.520 | talking about SOAR,

00:20:59.560 | but I wanted to point out a relative newcomer

00:21:02.360 | in the game called Sigma.

00:21:03.720 | So to talk about Spahn a little bit,

00:21:07.520 | we'll see if the sound works in here.

00:21:10.000 | I'm going to let the creator take this one.

00:21:14.160 | Or not.

00:21:17.680 | See how the AV system likes this.

00:21:23.480 | There we go.

00:21:28.160 | (soft music)

00:21:30.560 | - My name is Chris Weissman,

00:21:32.760 | and I'm the director of the Center for Theoretical

00:21:34.480 | Neuroscience at the University of Waterloo.

00:21:36.760 | And I'm actually jointly appointed

00:21:38.000 | between philosophy and engineering.

00:21:39.800 | The philosophy allows me to consider

00:21:42.000 | general conceptual issues about how the mind works.

00:21:44.600 | But of course, if I want to make claims

00:21:46.240 | about how the mind works,

00:21:47.400 | I have to understand also how the brain works.

00:21:49.000 | And this is where engineering plays a critical role.

00:21:51.440 | Engineering allows me to break down equations

00:21:54.000 | and very precise descriptions,

00:21:55.520 | which we can test by building actual models.

00:21:57.960 | One model that we built recently

00:21:59.200 | is called the Spahn model.

00:22:00.800 | This model Spahn has about two and a half million

00:22:03.120 | individual neurons that are simulated in it.

00:22:05.440 | And the input to the model is an eye,

00:22:07.400 | and the output from the model is a movement of an arm.

00:22:10.680 | So essentially, it can see images of numbers

00:22:13.040 | and then do something like categorize them,

00:22:15.160 | in which case it would just draw the number that it sees.

00:22:17.480 | Or it can actually try to reproduce the style

00:22:19.040 | of the number that it's looking at.

00:22:20.680 | So for instance, if it sees a loopy two,

00:22:22.720 | a two with a big loop on the bottom,

00:22:24.040 | it can actually reproduce that particular style too.

00:22:27.200 | On the medical side, we all know that

00:22:29.160 | we have cognitive challenges that show up

00:22:30.960 | as we get older, and we can try to address those challenges

00:22:33.720 | by simulating the aging process with these kinds of models.

00:22:36.520 | Another potential area of impact

00:22:38.000 | is on artificial intelligence.

00:22:39.840 | A lot of work in artificial intelligence

00:22:41.600 | attempts to build agents that are extremely good

00:22:43.320 | at one task, for instance, playing chess.

00:22:45.640 | What's special about Spahn

00:22:47.240 | is that it's quite good at many different tasks.

00:22:49.320 | And this adds the additional challenge

00:22:51.080 | of trying to figure out how to coordinate

00:22:52.560 | the flow of information through different parts

00:22:54.440 | of the model, something that animals

00:22:56.120 | seem to be very good at.

00:22:57.320 | So I'll provide a pointer at the end.

00:23:04.080 | He's got a really cool book called "How to Build a Brain."

00:23:06.440 | And if you Google him, you can, Google Spahn,

00:23:09.280 | you can find a toolkit where you can kind of

00:23:12.840 | construct circuits that will approximate functions

00:23:16.080 | that you're interested in, connect them together,

00:23:18.720 | set certain properties that you would want at a low level,

00:23:22.240 | and build them up, and actually work on tasks

00:23:25.800 | at the level of vision and robotic actuation.

00:23:28.880 | So that's a really cool system.

00:23:31.400 | As we move into architectures that are sitting

00:23:35.600 | above that biological level, I wanted to give you

00:23:39.040 | kind of an overall sense of what they're going to look like,

00:23:41.040 | what a prototypical architecture is going to look like.

00:23:44.240 | So they're gonna have some ability to have perception.

00:23:47.440 | The modalities typically are more digital symbolic,

00:23:52.520 | but they will, depending on the architecture,

00:23:56.000 | be able to handle vision, audition,

00:23:59.960 | and various sensory inputs.

00:24:02.440 | These will get represented in some sort

00:24:04.360 | of short-term memory, whatever the state representation

00:24:07.160 | for the particular system is.

00:24:08.680 | It's typical to have a representation of the knowledge

00:24:14.040 | of what tasks can be performed,

00:24:16.120 | when they should be performed, how they should be controlled.

00:24:18.960 | And so these are typically both actions

00:24:21.380 | that take place internally that manage

00:24:24.240 | the internal state of the system,

00:24:26.760 | and perform internal computations,

00:24:28.840 | but also about external actuation.

00:24:31.600 | And external might be a digital system, a game AI,

00:24:34.760 | but it might also be some sort of

00:24:36.560 | robotic actuation in the real world.

00:24:38.360 | There's typically some sort of mechanism

00:24:41.560 | by which to select from the available actions

00:24:44.880 | in a particular situation.

00:24:46.640 | There's typically some way to augment

00:24:49.760 | this procedural information, which is to say,

00:24:52.800 | learn about new actions, possibly modify existing ones.

00:24:56.060 | There's typically some semblance

00:24:58.240 | of what's called declarative memory.

00:25:00.520 | So whereas procedural, at least in humans,

00:25:03.020 | if I asked you to describe how to ride a bike,

00:25:07.480 | you might be able to say, get on the seat and pedal,

00:25:11.760 | but in terms of keeping your balance there,

00:25:13.680 | you'd have a pretty hard time describing it declaratively.

00:25:18.180 | So that's kind of the procedural side,

00:25:20.300 | the implicit representation of knowledge,

00:25:22.220 | whereas declarative would include facts, geography, math,

00:25:27.220 | but it could also include experiences that the agent has had,

00:25:30.520 | a more episodic representation of declarative memory.

00:25:33.300 | And they'll typically have some way

00:25:34.820 | of learning this information, augmenting it over time.

00:25:38.340 | And then finally, some way of taking actions in the world.

00:25:42.180 | And they'll all have some sort of cycle,

00:25:45.260 | which is perception comes in,

00:25:47.540 | knowledge that the agent has is brought to bear on that,

00:25:51.220 | an action is selected,

00:25:52.980 | knowledge that knows to condition on that action

00:25:55.240 | will act accordingly, both with internal processes,

00:25:57.980 | as well as eventually to take action,

00:25:59.780 | and then rinse and repeat.

00:26:01.180 | So when we talk about, in an AI system, an agent,

00:26:06.220 | in this context, that would be the fixed representation,

00:26:09.120 | which is whatever architecture we're talking about,

00:26:11.580 | plus set of knowledge that is typically specific

00:26:15.620 | to the task, but might be more general.

00:26:17.540 | So oftentimes, these systems could incorporate

00:26:20.020 | a more general knowledge base of facts,

00:26:23.340 | of linguistic facts, of geographic facts.

00:26:27.340 | Let's take Wikipedia,

00:26:28.660 | and let's just stick it in the brain of the system,

00:26:30.600 | that'd be more task in general.

00:26:32.460 | But then also, whatever it is that you're doing right now,

00:26:35.240 | how should you proceed in that?

00:26:36.840 | And then it's typical to see this processing cycle.

00:26:40.720 | And going back to the prior assumption,

00:26:43.660 | the idea is that these primitive cycles

00:26:47.340 | allow for the agent to be reactive to its environment.

00:26:50.620 | So if new things come in that has react to,

00:26:52.860 | if the lion's sitting over there,

00:26:54.180 | I better run and maybe not do my calculus homework, right?

00:26:57.300 | So as long as this cycle is going, I'm reactive,

00:27:01.100 | but at the same time, if multiple actions

00:27:03.200 | are taken over time, I'm able to get complex behavior

00:27:06.260 | over the long term.

00:27:07.440 | So this is the ACTR cognitive architecture.

00:27:13.620 | It has many of the kind of core pieces

00:27:16.340 | that I talked about before.

00:27:18.580 | Let's see if the, is the mouse,

00:27:21.740 | yes, mouse is useful up there.

00:27:23.920 | So we have the procedural model here.

00:27:26.460 | A short term memory is going to be these buffers

00:27:29.180 | that are on the outside.

00:27:31.280 | The procedural memory is encoded as

00:27:33.440 | what are called production rules, or if-then rules.

00:27:37.300 | If this is the state of my short term memory,

00:27:40.160 | this is what I think should happen as a result.

00:27:42.860 | You have a selection of the appropriate rule to fire

00:27:47.860 | and an execution.

00:27:50.060 | You're seeing associated parts of the brain

00:27:53.260 | being represented here.

00:27:54.560 | Cool thing that has been done over time

00:27:56.260 | in the ACTR community is to make predictions

00:28:01.260 | about brain areas and then perform MRIs

00:28:04.540 | and gather that data and correlate that data.

00:28:06.420 | So when you use the system, you will get predictions

00:28:09.460 | about things like timing of operations,

00:28:12.620 | errors that will occur, probabilities

00:28:14.720 | that something is learned, but you'll also get predictions

00:28:17.340 | about, to the degree that they can,

00:28:19.740 | kind of brain areas that are going to light up.

00:28:22.420 | And if you want to, that's actively being developed

00:28:27.480 | at Carnegie Mellon.

00:28:28.760 | To the left is John Anderson, who developed

00:28:32.700 | this cognitive architecture, ooh, 30-ish years ago.

00:28:38.940 | And until the last about five years,

00:28:40.700 | he was the primary researcher/developer behind it

00:28:43.740 | with Christian, and then recently,

00:28:45.620 | he's decided to spend more time

00:28:47.780 | on cognitive tutoring systems.

00:28:50.200 | And so Christian has become the primary developer.

00:28:53.500 | There is an annual ACTR workshop.

00:28:57.260 | There's a summer school, which if you're thinking

00:29:01.380 | about modeling a particular task,

00:29:03.320 | you can kind of bring your task to them,

00:29:04.820 | bring your data, they teach you how to use the system,

00:29:07.040 | and try to get that study going right there on the spot.

00:29:09.940 | To give you a sense of what kinds of tasks

00:29:14.280 | this could be applied to, so this is representative

00:29:18.960 | of a certain class of tasks, certainly not the only one.

00:29:21.860 | Let's try this again.

00:29:25.760 | Think PowerPoint's gonna want a restart every time.

00:29:28.720 | Okay, so we're getting predictions

00:29:31.600 | about basically where the eye is going to move.

00:29:34.240 | What you're not seeing is it's actually processing

00:29:36.880 | things like text and colors and making predictions

00:29:39.520 | about what to do and how to represent the information

00:29:41.920 | and how to process the graph as a whole.

00:29:44.640 | I had alluded to this earlier.

00:29:47.120 | There's work by Bonnie John, very similar,

00:29:50.800 | so making predictions about how humans

00:29:52.660 | would use computer interfaces.

00:29:55.280 | At the time, she got hired away by IBM,

00:29:57.800 | and so they wanted the ability to have software

00:29:59.880 | that you can put in front of software designers,

00:30:03.460 | and when they think they have a good interface,

00:30:05.280 | press a button, this model of human cognition

00:30:08.320 | would try to perform the tasks that it had been told to do

00:30:11.540 | and make predictions about how long it would take,

00:30:13.280 | and so you can have this tight feedback loop

00:30:16.060 | from designers saying, "Here's how good

00:30:17.480 | "your particular interface is."

00:30:20.840 | So ActR as a whole, it's very prevalent in this community.

00:30:24.040 | I went to their webpage and counted up

00:30:25.820 | just the papers that they knew about.

00:30:28.700 | It was over 1,100 papers over time.

00:30:31.300 | If you're interested in it, the main distribution

00:30:34.560 | is in Lisp, but many people have used this

00:30:38.120 | and wanted to apply it to systems

00:30:39.560 | that need a little more processing power.

00:30:42.420 | So there's the NRL has a Java port of it

00:30:45.480 | that they use in robotics.

00:30:47.040 | The Air Force Research Lab in Dayton

00:30:49.960 | has implemented it in Erlang for parallel processing

00:30:54.180 | of large declarative knowledge bases.

00:30:56.280 | They're trying to do service-oriented architectures with it,

00:30:59.760 | CUDA, because they want what it has to say.

00:31:02.880 | They don't want to wait around for it

00:31:04.040 | to have to figure that stuff out.

00:31:05.640 | So that's the two minutes about ActR.

00:31:10.840 | Sigma is a relative newcomer, and it's developed out

00:31:16.440 | at the University of Southern California

00:31:18.000 | by a man named Paul Rosenblum,

00:31:20.280 | and I'll mention him in a couple minutes

00:31:21.880 | because he was one of the prime developers

00:31:23.480 | of SOAR at Carnegie Mellon.

00:31:26.100 | So he knows a lot about how SOAR works,

00:31:27.880 | and he's worked on it over the years.

00:31:30.040 | And I think originally, I'm gonna speak for him,

00:31:32.480 | and he'll probably say I was wrong.

00:31:34.800 | I think originally it was kind of a mental exercise

00:31:37.720 | of can I reproduce SOAR using a uniform substrate?

00:31:42.720 | I'll talk about SOAR in a little bit.

00:31:45.120 | It's 30 years of research code.

00:31:47.080 | If anybody's dealt with research code,

00:31:49.080 | it's 30 years of C and C++

00:31:52.040 | with dozens of graduate students over time.

00:31:55.560 | It's not pretty at all.

00:31:57.920 | And theoretically, it's got these boxes sitting out here,

00:32:00.960 | and so he re-implemented the core functionality of SOAR

00:32:05.880 | all using factor graphs and message-passing algorithms

00:32:09.200 | under the hood.

00:32:10.120 | He got to that point and then said,

00:32:12.800 | there's nothing stopping me from going further.

00:32:15.040 | And so now I can do all sorts of modern machine learning,

00:32:18.180 | vision optimization sort of things

00:32:20.460 | that would take some time in any other architecture

00:32:23.640 | to be able to integrate well.

00:32:25.840 | So it's been an interesting experience.

00:32:29.100 | It's now gonna be the basis for the Virtual Human Project

00:32:32.240 | out at the Institute for Creative Technology.

00:32:34.480 | It's an institute associated

00:32:35.840 | with the University of Southern California.

00:32:38.720 | For him, until recently, he couldn't get your hands on it,

00:32:41.600 | but in the last couple of years,

00:32:42.680 | he's done some tutorials on it.

00:32:44.960 | He's got a public release with documentation.

00:32:47.880 | So that's something interesting to keep an eye on.

00:32:50.320 | But I'm gonna spend all the remaining time

00:32:53.080 | on the SOAR cognitive architecture.

00:32:55.960 | And so you see, it looks quite a bit

00:32:57.840 | like the prototypical architecture.

00:33:00.380 | And I'll give a sense again about how this all operates.

00:33:04.060 | Give a sense of the people involved.

00:33:05.940 | We already talked about Alan Newell.

00:33:07.740 | So both John Laird, who is my advisor,

00:33:10.620 | and Paul Rosenblum were students of Alan Newell.

00:33:14.980 | John's thesis project was related

00:33:18.500 | to the chunking mechanism in SOAR,

00:33:21.080 | which learns new rules based upon sub-goal reasoning.

00:33:26.180 | So he finished that, I believe, the year I was born.

00:33:31.180 | And so he's one of the few researchers you'll find

00:33:34.480 | who's still actively working on their thesis project.

00:33:37.780 | Beyond that, I think about 10 years ago,

00:33:42.840 | he founded SOAR Technology,

00:33:44.300 | which is a company up in Ann Arbor, Michigan.

00:33:47.000 | While it's called SOAR Technology,

00:33:48.160 | it doesn't do exclusively SOAR,

00:33:49.480 | but that's a part of the portfolio.

00:33:51.240 | General intelligence system stuff,

00:33:54.080 | a lot of defense association.

00:33:56.220 | So some notes of what's gonna make SOAR different

00:34:00.880 | from the other architectures that fall

00:34:03.120 | into this kind of functional architecture category.

00:34:06.060 | A big thing is a focus on efficiency.

00:34:08.720 | So John wants to be able to run SOAR on just about anything.

00:34:12.700 | We just got on the SOAR mailing list

00:34:15.280 | a desire to run it on a real-time processor.

00:34:19.200 | And our answer, while we had never done it before,

00:34:21.760 | was probably it'll work.

00:34:25.680 | Every release, there's timing tests.

00:34:27.960 | And we always, what we look at is,

00:34:30.760 | in a bunch of different domains

00:34:32.040 | for a bunch of different reasons

00:34:33.020 | that relate to human processing,

00:34:34.800 | there's this magic number that comes out,

00:34:36.160 | which is 50 milliseconds, which is to say,

00:34:39.040 | in terms of responding to tasks,

00:34:42.200 | if you're above that time, humans will sense a delay.

00:34:46.480 | And you don't want that to happen.

00:34:48.080 | Now, if we're working in a robotics task, 50 milliseconds,

00:34:51.220 | if you're dramatically above that,

00:34:52.760 | you just fell off the curb, or worse,

00:34:55.020 | or you just hit somebody in a car, right?

00:34:57.240 | So we're trying to keep that as low as possible.

00:34:59.560 | And for most agents, it doesn't even register.

00:35:02.640 | It's below one millisecond, fractions of millisecond.

00:35:05.960 | But I'll come back to this,

00:35:07.400 | because a lot of the work that I was doing

00:35:09.000 | was computer science, AI,

00:35:12.080 | and a lot of efficient algorithms and data structures.

00:35:14.380 | And 50 milliseconds was that very high upper bound.

00:35:17.380 | It's also one of the projects

00:35:19.760 | that has a public distribution.

00:35:21.160 | You can get it in all sorts of operating systems.

00:35:24.800 | We use something called SWIG

00:35:26.000 | that allows you to interface with it

00:35:27.480 | in a bunch of different languages.

00:35:28.480 | We kind of describe the meta description,

00:35:30.000 | and you are able to basically generate bindings

00:35:33.640 | in a bunch of different platforms.

00:35:35.760 | Core is C++.

00:35:38.360 | There was a team at SorTech that said,

00:35:40.400 | "We don't like C++.

00:35:41.720 | "It gets messy."

00:35:42.760 | So they actually did a port over to pure Java,

00:35:45.920 | in case that appeals to you.

00:35:47.560 | There's an annual Sor workshop that takes place

00:35:51.000 | in Ann Arbor, typically.

00:35:52.520 | It's free.

00:35:54.200 | You can go there, get a Sor tutorial,

00:35:55.920 | and talk to folks who are working on Sor.

00:35:57.480 | And it's fun.

00:35:58.600 | I've been there every year but one in the last decade.

00:36:01.640 | It's just fun to see the people around the world

00:36:03.480 | that are using the system in all sorts of interesting ways.

00:36:06.620 | To give you a sense of the diversity of the applications,

00:36:10.160 | one of the first was R1 Sor,

00:36:12.480 | which was back in the days

00:36:13.760 | when it was an actual challenge to build a computer,

00:36:16.520 | which is to say that your choice of certain components

00:36:20.120 | would have radical implications

00:36:23.120 | for other parts of the computer.

00:36:24.480 | So it wasn't just the Dell website where you just,

00:36:26.800 | "I want this much RAM, I want this much CPU."

00:36:29.120 | There was a lot of thinking that went behind it,

00:36:30.720 | and then physical labor that went to construct your computer.

00:36:33.760 | And so it was making that process a lot better.

00:36:37.120 | There are folks that apply it

00:36:38.040 | to natural language processing.

00:36:39.780 | Sor 7 was the core of the Virtual Humans Project

00:36:43.640 | for a long time.

00:36:44.920 | HCI tasks.

00:36:46.520 | TAC Air Sor was one of the largest rule-based systems.

00:36:49.520 | Tens of thousands of rules over 48 hours.

00:36:52.040 | It was a very large-scale simulation,

00:36:54.080 | a defense simulation.

00:36:56.360 | Lots of games it's been applied to for various reasons.

00:36:59.780 | And then in the last few years,

00:37:02.640 | porting it onto mobile robotics platforms.

00:37:04.840 | This is Edwin Olson's SplinterBot,

00:37:07.840 | an early version of it

00:37:08.980 | that went on to win the Magic competition.

00:37:11.620 | Then I went on to put Sor on the web.

00:37:16.780 | And if after this talk you're really interested

00:37:18.960 | in a dice game that I'm gonna talk about,

00:37:20.760 | you can actually go to the iOS app store and download.

00:37:25.000 | It's called Michigan Liars Dice.

00:37:26.360 | It's free.

00:37:27.240 | You don't have to pay for it.

00:37:28.580 | But you can actually play Liars Dice with Sor.

00:37:32.920 | And you can set the difficulty level.

00:37:35.240 | It's pretty good.

00:37:36.720 | It beats me on a regular basis.

00:37:39.320 | I wanted to give you a couple other

00:37:40.600 | just kind of really weird-feeling sort of applications

00:37:44.200 | and really cool applications.

00:37:46.220 | The first one is out of Georgia Tech.

00:37:49.920 | Go PowerPoint.

00:37:51.040 | Yes.

00:37:51.880 | (upbeat music)

00:37:54.460 | - Lumini is a dome-based interactive art installation

00:38:00.920 | in which human participants can engage

00:38:03.160 | in collaborative movement improvisation

00:38:05.240 | with each other and virtual dance partners.

00:38:08.580 | This interaction creates a hybrid space

00:38:11.120 | in which virtual and corporeal bodies meet.

00:38:14.600 | The line between human and non-human is blurred,

00:38:17.840 | spurring participants to examine

00:38:19.600 | their relationship with technology.

00:38:22.480 | The Lumini installation ultimately examines

00:38:25.200 | how humans and machine can co-create experiences.

00:38:28.840 | And it does so in a playful environment.

00:38:31.960 | The dome creates a social space

00:38:33.880 | that encourages human-human interaction

00:38:36.040 | and collective dance experiences,

00:38:38.520 | allowing participants to creatively explore movement

00:38:41.520 | while having fun.

00:38:44.160 | The development of Lumini has been a hybrid exploration

00:38:47.360 | in art forms of theater and dance,

00:38:49.880 | as well as research in artificial intelligence

00:38:52.400 | and cognitive science.

00:38:54.000 | Lumini draws on inspiration

00:38:57.560 | from the ancient art form of shadow theater.

00:39:00.760 | The original two-dimensional version of the installation

00:39:03.800 | led to the conceptualization of the dome

00:39:06.120 | as a liminal space, with human silhouettes

00:39:09.200 | and virtual characters meeting to dance together

00:39:11.880 | on the projection surface.

00:39:14.560 | Rather than relying on a pre-opened library

00:39:17.200 | of movement responses,

00:39:19.000 | the virtual dancer learns its partner's movements

00:39:21.840 | and utilizes viewpoint's movement theory

00:39:24.080 | to systematically reason about them

00:39:26.000 | in order to improvisationally choose a movement response.

00:39:29.520 | Viewpoint's theory is based in dance and theater

00:39:33.360 | and analyzes the performance along the dimensions

00:39:35.960 | of tempo, duration, repetition,

00:39:39.560 | kinesthetic response, shape, spatial relationships,

00:39:43.720 | gesture, architecture, and movement topography.

00:39:47.560 | The virtual dancer is able to use

00:39:50.760 | several different strategies to respond to human movements.

00:39:54.440 | These include mimicry of the movement,

00:39:56.680 | transformation of the movement along viewpoint's dimensions,

00:40:00.040 | recalling a similar or complementary movement from memory

00:40:03.080 | in terms of viewpoint's dimensions,

00:40:05.360 | and applying action-response patterns

00:40:07.400 | that the agent has learned while dancing

00:40:09.440 | with its human partner.

00:40:11.000 | - The reason we did this is this is part

00:40:15.680 | of a larger effort in our lab

00:40:17.560 | for understanding the relationship between

00:40:19.480 | computation, cognition, and creativity,

00:40:23.280 | where a large amount of our efforts

00:40:25.080 | go into understanding human creativity

00:40:27.640 | and how we make things together,

00:40:28.920 | how we're creative together,

00:40:30.560 | as a way to help us understand

00:40:32.000 | how we can build co-creative AI

00:40:35.240 | that serves the same purpose,

00:40:37.280 | where it can be a colleague and collaborate with us

00:40:39.960 | and create things with us.

00:40:41.480 | - So Brian was a graduate student

00:40:50.840 | in John Laird's lab as well.

00:40:53.320 | Before I start this, I alluded to this earlier

00:40:57.360 | where we're getting closer to Rosie saying,

00:40:59.960 | "Can you teach me?"

00:41:01.080 | So let me give you some introduction to this.

00:41:02.840 | In the lower left, you're seeing the view

00:41:05.560 | of a Kinect camera onto a flat surface.

00:41:08.360 | There's a robotic arm, mainly 3D-printed parts,

00:41:12.720 | few servos.

00:41:14.560 | Above that, you're seeing an interpretation of the scene.

00:41:19.280 | We're giving it associations of the four areas

00:41:22.480 | with semantic titles, like one is the table,

00:41:26.320 | one is the garbage, just semantic terms for areas.

00:41:30.480 | But other than that, the agent doesn't actually

00:41:32.480 | know all that much, and it's going to operate

00:41:34.880 | in two modalities.

00:41:35.920 | One is, we'll call it natural language,

00:41:38.200 | natural-ish language, a restricted subset of English,

00:41:43.200 | as well as some quote-unquote pointing.

00:41:46.000 | So you're going to see some mouse pointers

00:41:48.200 | in the upper left saying, "I'm talking about this."

00:41:51.120 | And this is just a way to indicate location.

00:41:54.440 | And so starting off, we're going to say things like,

00:41:56.880 | "Pick up the blue block," and it's going to be like,

00:41:58.560 | "I don't know what blue is.

00:42:00.120 | "What is blue?"

00:42:01.840 | We say, "Oh, well, that's a color."

00:42:03.440 | "Okay, so go get the green thing."

00:42:08.440 | "What's green?"

00:42:09.320 | "Oh, it's a color."

00:42:10.140 | "Okay, move the blue thing to a particular location."

00:42:13.440 | "Where's that?"

00:42:14.680 | Point it, "Okay, what is moving?"

00:42:17.360 | Really, it has to start from the beginning,

00:42:19.480 | and it's described, and it's said,

00:42:20.820 | "Okay, now you've finished."

00:42:22.500 | And once we got to that point, now I can say,

00:42:25.320 | "Move the green thing over here,"

00:42:27.360 | and it's got everything that it needs

00:42:29.280 | to be able to then reproduce the task,

00:42:31.280 | given new parameters, and it's learned that ability.

00:42:33.800 | So let me give it a little bit of time.

00:42:37.020 | So you can look a little bit at top left

00:42:47.720 | in terms of the pointers.

00:42:48.800 | You're going to see some text commands being entered.

00:42:52.060 | So what kind of attribute is blue?

00:42:56.120 | We're going to say it's a color,

00:42:57.720 | and so that can map it then

00:42:59.000 | to a particular sensor modality.

00:43:02.000 | This is green, so the pointing,

00:43:04.000 | what kind of thing is green?

00:43:05.080 | Okay, color, so now it knows how to understand

00:43:07.080 | blue and green as colors with respect to the visual scene.

00:43:10.880 | Move rectangle to the table.

00:43:13.940 | What is rectangle?

00:43:17.640 | Okay, now I can map that onto,

00:43:19.440 | or understanding parts of the world.

00:43:22.080 | Is this the blue rectangle?

00:43:23.200 | So the arm is actually pointing itself

00:43:26.040 | to get confirmation from the instructor,

00:43:28.600 | and then we're trying to understand,

00:43:30.520 | in general, when you say move something,

00:43:32.140 | what is the goal of this operation?

00:43:35.000 | And so then it also has a declarative representation

00:43:37.280 | of the idea of this task,

00:43:38.600 | not only that it completed it,

00:43:40.240 | then it can look back on having completed the task

00:43:42.880 | and understand what were the steps

00:43:44.840 | that led to achieving a particular goal.

00:43:47.120 | So in order to move it, you're going to have to pick it up.

00:43:54.320 | It knows which one the blue thing is.

00:43:56.360 | (mouse clicking)

00:43:59.120 | Great.

00:44:01.560 | Now put it in the table.

00:44:06.960 | So that's a particular location.

00:44:08.560 | At this point, we can say, you're done.

00:44:12.440 | You have accomplished the move the blue rectangle

00:44:14.960 | to the table.

00:44:16.040 | And so now it can understand what that very simple

00:44:18.960 | kind of process is like,

00:44:21.320 | and associate that with the verb to move.

00:44:25.460 | And now we can say move the green object, or not,

00:44:28.940 | to the garbage.

00:44:32.740 | And without any further interaction,

00:44:36.780 | based on everything that learned up till that point,

00:44:40.400 | it can successfully complete that task.

00:44:42.580 | So this is a work of Shivali Mohan and others

00:44:45.240 | at the SOAR Group at the University of Michigan

00:44:47.060 | on the ROSI project.

00:44:49.460 | And they're extending this to playing games

00:44:51.560 | and learning the rules of games

00:44:53.200 | through text-based descriptions and multimodal experience.

00:44:56.920 | So in order to build up to here's a story in SOAR,

00:45:01.040 | I wanted to give you a sense of how research occurs

00:45:03.120 | in the group.

00:45:04.000 | And so there's these back and forths that occur over time

00:45:07.840 | between there's this piece of software called SOAR,

00:45:10.720 | and we want to make this thing better

00:45:11.800 | and give it new capabilities,

00:45:13.040 | and so all our agents are going to become better.

00:45:16.080 | And we always have to keep in mind,

00:45:17.360 | and you'll see this as I go further,

00:45:19.200 | that it has to be useful to a wide variety of agents.

00:45:22.400 | It has to be task independent,

00:45:25.040 | and it has to be efficient.

00:45:26.100 | For us to do anything in the architecture,

00:45:27.640 | all of those have to hold true.

00:45:29.300 | So we do something cool in the architecture,

00:45:32.920 | and then we say, okay, let's solve a cool problem.

00:45:35.320 | So let's build some agents to do this.

00:45:37.160 | And so this ends up testing what are the limitations,

00:45:40.360 | what are the issues that arise in a particular mechanism,

00:45:43.980 | as well as integration with others.

00:45:45.880 | And we get to solve interesting problems.

00:45:47.400 | We usually find there was something missing,

00:45:48.920 | and then we can go back to the architecture

00:45:50.800 | and rinse and repeat.

00:45:52.800 | Just to give you an idea, again, how SOAR works.

00:45:55.720 | So the working memory is actually

00:45:57.200 | a directed connected graph.

00:45:59.400 | The perception is just a subset of that graph,

00:46:02.280 | and so there's going to be symbolic representations

00:46:04.600 | of most of the world.

00:46:06.320 | There is a visual subsystem

00:46:07.540 | in which you can provide a scene graph,

00:46:09.580 | just not showing it here.

00:46:11.080 | Actions are also a subset of that graph,

00:46:14.040 | and so the procedural knowledge,

00:46:16.880 | which is also production rules,

00:46:18.320 | can read sections of the input,

00:46:20.580 | modify sections of the output,

00:46:21.920 | as well as arbitrary parts of the graph to take actions.

00:46:25.260 | So the decision procedure says,

00:46:27.000 | of all the things that I know to do,

00:46:28.440 | and I've kind of ranked them

00:46:29.400 | according to various preferences,

00:46:31.340 | what single thing should I do?

00:46:33.000 | Semantic memory for facts, there's episodic memory.

00:46:38.120 | The agent is always actually storing

00:46:40.600 | every experience it's ever had over time in episodic memory,

00:46:43.520 | and it has the ability to get back to that.

00:46:45.880 | And so the similar cycle we saw before,

00:46:48.360 | we get input in this perception called the input link.

00:46:51.420 | Rules are going to fire all in parallel and say,

00:46:54.420 | here's everything I know about the situation,

00:46:55.980 | here's all the things I could do.

00:46:57.460 | Decision procedure says, here's what we're going to do.

00:47:01.860 | Based upon the selected operator,

00:47:04.080 | all sorts of things could happen

00:47:05.340 | with respect to memories providing input,

00:47:08.660 | rules firing to perform computations,

00:47:11.180 | and as well as potentially output in the world.

00:47:13.700 | And remember, agent reactivity is required.

00:47:18.180 | We want the system to be able to react

00:47:21.240 | to things in the world at a very quick pace.

00:47:24.200 | So anything that happens in this cycle, at max,

00:47:26.960 | the overall cycle has to be under 50 milliseconds.

00:47:29.840 | And so that's going to be a constraint we hold ourselves to.

00:47:32.400 | And so the story I'll be telling will say

00:47:35.900 | how we got to a point where we started

00:47:38.040 | actually forgetting things.

00:47:40.120 | And we're an architecture that doesn't want to be like humans,

00:47:42.800 | we want to create cool systems,

00:47:44.740 | but what we realized was something that we do,

00:47:48.200 | there's probably some benefit to it.

00:47:50.160 | And we actually put it into our system

00:47:51.720 | and it led to good outputs.

00:47:53.680 | So here's the research path I'm going to walk down.

00:47:57.080 | We had just a simple problem,

00:47:58.720 | which was we have these memory systems

00:48:01.320 | and sometimes they're going to get a queue

00:48:02.880 | that could relate to multiple memories.

00:48:05.480 | And the question is, if you have a fixed mechanism,

00:48:08.480 | what should you return in a task-independent way?

00:48:11.520 | Which one of these many memories should you return?

00:48:13.840 | That was our question.

00:48:15.300 | And we looked to some human data on this,

00:48:17.620 | something called the rational analysis of memory

00:48:19.220 | done by John Anderson,

00:48:20.540 | and realized that in human language,

00:48:24.420 | there are recency and frequency effects

00:48:26.780 | that maybe would be useful.

00:48:29.220 | And so we actually did analysis,

00:48:31.780 | found that not only does this occur,

00:48:33.460 | but it's useful in what are called

00:48:34.900 | word sense disambiguation tasks.

00:48:36.780 | And I'll get to that, what that means in a second.

00:48:39.700 | Developed some algorithms to scale this really well.

00:48:42.300 | And it turned out to work out well,

00:48:44.020 | not only in the original task,

00:48:45.720 | but when we looked to two other completely different ones,

00:48:48.560 | the same underlying mechanism

00:48:50.780 | ended up producing some really interesting outputs.

00:48:54.240 | So let me talk about word sense disambiguation real quick.

00:48:56.700 | This is a core problem in natural language processing,

00:48:59.260 | if you haven't heard of it before.

00:49:00.780 | Let's say we have an agent,

00:49:02.340 | and for some reason it needs to understand the verb to run.

00:49:05.300 | Looks to its memory and finds that it could run in the park,

00:49:10.860 | it could be running a fever,

00:49:12.260 | could run an election, it could run a program.

00:49:15.220 | And the question is,

00:49:16.060 | what should a task independent memory mechanism return

00:49:20.580 | if all you've been given is the verb to run?

00:49:24.220 | And so the rational analysis of memory

00:49:26.660 | looked through multiple text corpora,

00:49:28.580 | and what they found was,

00:49:30.580 | if a particular word had been used recently,

00:49:33.700 | it's very likely to be reused again.

00:49:36.700 | And if it hadn't been used recently,

00:49:38.460 | there's going to be this effect where the expression here,

00:49:41.740 | the T is time since the most recent use,

00:49:44.500 | it's going to sum those with a exponential decay.

00:49:48.740 | So what it looks like if time is going to the right,

00:49:52.980 | activation higher is better.

00:49:55.300 | As you get these individual usages,

00:49:56.980 | you get these little drops and then eventually drop down.

00:49:59.780 | And so if we had just one usage of a word,

00:50:01.980 | the red would be what the decay would look like.

00:50:05.260 | And so the core problem here is,

00:50:07.020 | if we're at a particular point

00:50:08.420 | and we want to select between the blue thing

00:50:10.500 | or the red thing, blue would have a higher activation,

00:50:13.220 | and so maybe that's useful.

00:50:15.980 | This is how things are modeled with human memory,

00:50:20.140 | but is it useful in general for tasks?

00:50:22.980 | And so we looked at common corpora

00:50:25.580 | used in word-sense disambiguation and just said,

00:50:28.180 | well, if we just look at this corpora twice

00:50:29.820 | and we just use answers, prior answers,

00:50:33.260 | I asked the question, what is the sense of this word?

00:50:35.820 | I took a guess, I got the right answer,

00:50:37.540 | and I used that recency and frequency information

00:50:40.500 | in my task-independent memory.

00:50:42.180 | Would that be useful?

00:50:43.460 | And somewhat of a surprise, but somewhat maybe not

00:50:46.300 | of a surprise, it actually performed really well

00:50:49.300 | across multiple corpora.

00:50:51.660 | So we said, OK, this seems like a reasonable mechanism.

00:50:57.220 | Let's look at implementing this efficiently

00:50:59.740 | in the architecture.

00:51:00.900 | And the problem was this term right here said,

00:51:04.220 | for every memory, for every time step,

00:51:07.980 | you're having to decay everything.

00:51:11.640 | That doesn't sound like a recipe for efficiency

00:51:13.740 | if you're talking about lots and lots of knowledge

00:51:15.740 | over long periods of time.

00:51:18.460 | So we made use of a nice approximation

00:51:21.620 | that Petrov had come up with to approximate tail effect.

00:51:25.100 | So accesses that happened long, long ago,

00:51:29.300 | we could basically approximate their effect

00:51:31.020 | on the overall sum.

00:51:32.140 | So we had a fixed set of values.

00:51:35.340 | And what we basically said is, since these are always

00:51:38.220 | decreasing, and all we care about is relative order,

00:51:41.700 | let's just only recompute when someone gets a new value.

00:51:45.220 | So it's a guess.

00:51:46.340 | It's a heuristic, an approximation.

00:51:48.840 | But we looked at how this worked on the same set of corpora.

00:51:53.340 | And in terms of query time, if we made these approximations

00:51:56.780 | well under our 50 millisecond, the effect on task performance

00:52:01.520 | was negligible.

00:52:02.300 | In fact, on a couple of these, it

00:52:03.940 | got ever so slightly better in terms of accuracy.

00:52:07.400 | And actually, if we looked at the individual decisions that

00:52:10.520 | were being made, making these sorts of approximations

00:52:14.200 | were leading to up to 90--

00:52:17.260 | sorry, at least 90% of the decisions being made

00:52:20.160 | were identical to having done the true full calculation.

00:52:25.740 | So we said, this is great.

00:52:28.120 | And we implemented this, and it worked really well.

00:52:31.200 | And then we started working on what seemed like completely

00:52:34.500 | unrelated problems.

00:52:35.880 | One was in mobile robotics.

00:52:37.820 | We had a mobile robot I'll show a picture of in a little while

00:52:40.940 | roaming around the halls, performing all sorts of tasks.

00:52:43.780 | And what we were finding was, if you

00:52:46.900 | have a system that's remembering everything

00:52:48.740 | in your short-term memory, and your short-term memory

00:52:50.900 | gets really, really big--

00:52:52.240 | I don't know about you.

00:52:53.340 | My short-term memory feels really, really small.

00:52:55.420 | I would love it to be big.

00:52:57.540 | But if you make your memory really big,

00:52:59.180 | and you try to remember something,

00:53:01.000 | you're now having to pull lots and lots and lots of information

00:53:03.820 | into your short-term memory.

00:53:05.180 | So the system was actually getting slower simply

00:53:07.780 | because it had a lot of short-term memory,

00:53:10.540 | representation of the overall map it was looking at.

00:53:14.340 | So large working memory a problem.

00:53:17.060 | Liars, dice is a game you play with dice.

00:53:19.120 | We were doing an RL-based system on this, reinforcement

00:53:21.420 | learning.

00:53:22.580 | And it turned out it's a really, really big value function.

00:53:26.140 | We were having to store lots of data.

00:53:27.860 | And we didn't know which stuff we

00:53:29.220 | had to keep around to keep the performance up.

00:53:33.280 | So we had a hypothesis that forgetting was actually

00:53:36.420 | going to be a beneficial thing.

00:53:38.820 | That maybe the problem we have with our memory

00:53:43.220 | is that we really, really dislike this forgetting thing.

00:53:46.100 | Maybe it's actually useful.

00:53:47.620 | And so we experimented with the following policy.

00:53:49.660 | We said, let's forget a memory if, one, we haven't really--

00:53:55.660 | it's not predicted to be useful by this base level activation.

00:53:58.640 | We haven't used it recently.

00:53:59.760 | We haven't used it frequently.

00:54:01.060 | Maybe it's not worth it.

00:54:02.500 | That and we felt confident that we could approximately

00:54:06.740 | reconstruct it if we absolutely had to.

00:54:09.420 | And if those two things held, we could forget something.

00:54:13.620 | So it's this same basic algorithm,

00:54:15.780 | but instead of the ranking them, it's

00:54:18.500 | if we set a threshold for base level activation,

00:54:22.500 | finding when it is that a memory is

00:54:25.040 | going to pass that threshold and try

00:54:26.860 | to forget based upon that in a way that's efficient,

00:54:29.740 | that isn't going to scale really, really poorly.

00:54:33.640 | So we were able to come up with an efficient way

00:54:36.220 | to implement this using an approximation that ended up

00:54:42.980 | for most memories to be exactly correct to the original.

00:54:48.900 | I'm happy to go over details of this

00:54:50.360 | if anybody's interested later.

00:54:52.340 | But it ended up being a fairly close approximation, one

00:54:55.540 | that, as compared to an accurate, completely accurate

00:54:59.900 | search for the value, ended up being somewhere between 15

00:55:04.140 | to 20 times faster.

00:55:06.820 | And so when we looked at our mobile robot here--

00:55:09.340 | oh, sorry.

00:55:10.940 | Let me get this back.

00:55:12.220 | Because our little robot's actually going around.

00:55:14.220 | That's the third floor of the computer science building

00:55:16.520 | at the University of Michigan.

00:55:17.820 | He's going around.

00:55:18.580 | He's building a map.

00:55:19.940 | And again, the idea was this map is getting too big.

00:55:22.380 | So here was the basic idea.

00:55:23.780 | As the robot's going around, it's

00:55:25.540 | going to need this map information about rooms.

00:55:27.840 | The color there is describing the strength of the memory.

00:55:30.960 | And as it gets farther and farther away

00:55:32.580 | and it hasn't used part of the map for planning

00:55:34.580 | or other purposes, basically make it decay away

00:55:37.460 | so that by the time it gets to the bottom,

00:55:39.380 | it's forgotten about the top.

00:55:41.180 | But we had the belief that we could reconstruct portions

00:55:46.020 | of that map if necessary.

00:55:48.540 | And so the hypothesis was this would take care

00:55:50.420 | of our speed problems.

00:55:53.060 | And so what we looked at was here's

00:55:54.560 | our 50 millisecond threshold.

00:55:56.700 | If we do no forgetting whatsoever,

00:55:58.700 | bad things were happening over time.

00:56:00.580 | So just 3,600 seconds.

00:56:04.620 | This isn't a very long time.

00:56:06.260 | We're passing that threshold.

00:56:07.500 | This is dangerous for the robot.

00:56:09.540 | If we implemented task-specific basically cleanup rules,

00:56:12.700 | which is really hard to get right,

00:56:14.540 | that basically solved the problem.

00:56:16.620 | When we looked at our general forgetting mechanism

00:56:18.660 | that we're using in other places,

00:56:20.540 | at an appropriate level of decay,

00:56:22.500 | we were actually doing better than hand-tuned rules.

00:56:25.100 | So this was kind of a surprise win for us.

00:56:29.540 | The other task seems totally unrelated.

00:56:31.340 | It's a dice game.

00:56:32.860 | You cover your dice.

00:56:34.100 | You make bids about what are under other people's cups.

00:56:37.560 | This is played in Pirates of the Caribbean

00:56:39.860 | when they're on the boat in the second movie and bidding

00:56:42.180 | for lives of service.

00:56:43.340 | Honestly, this is a game we love to play

00:56:45.140 | in the University of Michigan lab.

00:56:47.340 | And so we're like, hmm, could Soar play this?

00:56:50.320 | And so we built a system that could

00:56:52.540 | learn to play this game rather well with reinforcement

00:56:55.060 | learning.

00:56:55.740 | And so the basic idea was, in a particular state of the game,

00:56:58.740 | Soar would have options of actions to perform.

00:57:02.100 | It could construct estimates of their associated value.

00:57:06.140 | It would choose one of those.

00:57:07.540 | And depending on the outcome, something good happened,

00:57:09.900 | you might update that value.

00:57:11.700 | And the big problem was that the size of the state space,

00:57:14.680 | the number of possible states and actions, just is enormous.

00:57:19.340 | And so memory was blowing up.

00:57:20.940 | And so what we said, similar sort of hypothesis,

00:57:24.580 | if we decay away these estimates that we could probably

00:57:28.180 | reconstruct and we haven't used in a while,

00:57:30.300 | are things going to get better?

00:57:33.260 | And so if we don't forget at all,

00:57:36.020 | 40,000 games isn't a whole lot when it

00:57:37.780 | comes to reinforcement learning.

00:57:39.260 | We were up at 2 gigs.

00:57:40.700 | We wanted to put this on an iPhone.

00:57:42.860 | That wasn't going to work so well.

00:57:45.980 | There had been prior work that had used a similar approach.

00:57:50.900 | They were down at 400 or 500 megs.

00:57:53.540 | The iPhone's not going to be happy, but it'll work.

00:57:57.060 | So that gave us some hope.

00:57:59.060 | And we implemented our system.

00:58:01.140 | OK, we're somewhere in the middle.

00:58:02.540 | We can fit on the iPhone, a very good iPhone, maybe an iPad.

00:58:07.260 | The question was, though, one, efficiency.

00:58:10.220 | Yeah, we fit under our 50 milliseconds.

00:58:12.940 | But two, how does the system actually

00:58:14.540 | perform when you start forgetting stuff?

00:58:16.420 | Can it learn to play well?

00:58:18.540 | And so y-axis here, you're seeing competency.

00:58:21.820 | You play 1,000 games.

00:58:23.020 | How many do you win?

00:58:23.980 | So the bottom here, 500, that's flipping a coin,

00:58:27.780 | whether or not you're going to win.

00:58:30.660 | If we do no forgetting whatsoever,

00:58:32.620 | this is a pretty good system.

00:58:36.340 | The prior work, while keeping the memory low,

00:58:39.060 | is also suffering with respect to how well it

00:58:42.140 | was playing the game.

00:58:43.420 | And kind of cool was the system that was basically,

00:58:47.180 | more than having the memory requirement,

00:58:49.500 | was still performing at the level

00:58:51.660 | of no forgetting whatsoever.

00:58:55.540 | So just to bring back why I went through this story was,

00:58:59.460 | we had a problem.

00:59:00.620 | We looked to our example of human-level AI,

00:59:03.780 | which is humans themselves.

00:59:05.420 | We took an idea.

00:59:06.620 | It turned out to be beneficial.

00:59:08.060 | We found efficient implementations

00:59:10.260 | and then found it was useful in other parts of the architecture

00:59:13.100 | and other tasks that didn't seem to relate whatsoever.

00:59:16.460 | But if you download SOAR right now,

00:59:18.500 | you would gain access to all these mechanisms

00:59:20.420 | for whatever task you wanted to perform.

00:59:24.940 | Just to give some sense in the field of cognitive architecture

00:59:27.380 | what some of the open issues are,

00:59:28.580 | I think this is true in a lot of fields in AI,

00:59:30.500 | but integration of systems over time.

00:59:33.420 | The goal was that you wouldn't have all these theories.

00:59:36.860 | And so you could just kind of build over time,

00:59:39.740 | particularly when folks are working on different

00:59:41.780 | architectures, that becomes hard.

00:59:43.780 | But also when you have very different initial starting

00:59:46.100 | points, that can still be an issue.

00:59:48.140 | Transfer learning is an issue.

00:59:50.140 | We're building into the space of multimodal representations,

00:59:52.780 | which is to say not only abstract symbolic, but also

00:59:56.220 | visual.

00:59:56.860 | Wouldn't it be nice if we had auditory and other senses?

00:59:59.700 | But building that into memories and processing

01:00:02.540 | is still an open question.

01:00:04.540 | There's folks working on metacognition,

01:00:07.020 | which is to say the agent self-assessing its own state,

01:00:10.580 | its own processing.

01:00:11.980 | Some work has been done in here, but still a lot.

01:00:14.660 | And I think the last one is a really important question

01:00:17.460 | for anybody taking this kind of class, which

01:00:19.940 | is what would happen if we did succeed,

01:00:23.020 | if we did make human-level AI?

01:00:25.140 | And if you don't know, that picture right there

01:00:28.460 | is from a show that I recommend that you watch.

01:00:31.260 | It's by the BBC.

01:00:31.900 | It's called Humans.

01:00:33.340 | And it's basically what if we were

01:00:35.180 | able to develop what are called synths in the show.

01:00:38.260 | Think the robot that can clean up after your laundry

01:00:40.860 | and cook and all that good stuff, interact with you.

01:00:43.340 | It looks and interacts as a human,

01:00:46.900 | but is completely your servant.

01:00:49.100 | And then hilarity and complex issues ensue.

01:00:52.500 | So I highly recommend, if you haven't seen that,

01:00:54.860 | to go watch that.

01:00:55.660 | I think these days there's a lot of attention

01:01:03.860 | paid to machine learning, and particularly deep learning

01:01:06.140 | methods, as well it should.

01:01:07.620 | They're doing absolutely amazing things.

01:01:09.940 | And often the question is, well, you're doing this,

01:01:14.740 | and there's deep learning over there.

01:01:16.380 | How do they compare?

01:01:18.140 | And I honestly don't feel that that's always

01:01:21.740 | a fruitful question, because most of the time

01:01:24.180 | they tend to be working on different problems.

01:01:27.940 | If I'm trying to find objects in a scene,

01:01:31.420 | I'm going to pull out TensorFlow.

01:01:33.260 | I'm really not going to pull out SOAR.

01:01:34.780 | It doesn't make sense.

01:01:35.700 | It's not the right tool for the job.

01:01:38.060 | That having been said, there are times

01:01:39.640 | when they tend to work together really, really well.

01:01:41.940 | So the ROSI system that you saw there,

01:01:44.340 | there was some, I believe, neural networks being

01:01:48.380 | used in the object recognition mechanisms for the vision

01:01:50.900 | system.

01:01:51.660 | There's TD learning going on in terms of the dice game,

01:01:55.220 | where we can pick and choose and use this stuff.

01:01:57.220 | Absolutely great, because there are

01:01:58.300 | problems that are best solved by these methods,

01:02:00.300 | so why avoid it?

01:02:02.220 | And then on the other side, if you're

01:02:03.740 | trying to develop a system where you, in different situations,

01:02:08.140 | know exactly what you want the system to do,

01:02:11.220 | SOAR or other rule-based systems end up

01:02:12.940 | being the right tool for the right job.

01:02:14.560 | So absolutely, why not?

01:02:15.660 | Make it a piece of the overall system.

01:02:19.940 | Some recommended readings and some venues.

01:02:22.540 | I'd mentioned unified theories of cognition.

01:02:24.380 | This is Harvard Press, I believe.

01:02:28.300 | The SOAR cognitive architecture was MIT Press.

01:02:30.780 | Came out in 2012.

01:02:32.920 | I'll say I'm co-author and theoretically

01:02:36.420 | would get proceeds, but I've donated them all

01:02:38.340 | to the University of Michigan, so I

01:02:39.800 | can just make this recommendation free

01:02:41.780 | of ethical concerns, personally.

01:02:44.740 | It's an interesting book.

01:02:45.740 | It brings together lots of history

01:02:47.860 | and lots of the new features.

01:02:49.660 | If you're really interested in SOAR, it's an easy sell.

01:02:54.820 | I had mentioned Chris Elias Smith's "How to Build a Brain."

01:02:57.580 | Really cool read.

01:02:58.500 | Download the software.

01:02:59.540 | Go through the tutorials.

01:03:00.580 | It's really great.

01:03:02.180 | "How Can the Human Mind Occur in the Physical Universe?"

01:03:05.300 | is one of the core ACDAR books.

01:03:07.780 | So it talks through a lot of the psychological underpinnings

01:03:11.340 | and how the architecture works.

01:03:12.660 | It's a fascinating read.

01:03:15.860 | One of the papers--

01:03:17.380 | trying to remember what year-- 2008.

01:03:20.740 | This goes through a lot of different architectures

01:03:23.220 | in the field.

01:03:23.820 | It's 10 years old, but it gives you a good broad sweep.

01:03:27.980 | If you want something a little more recent,

01:03:29.820 | this is last month's issue of "AI Magazine,"

01:03:34.140 | completely dedicated to cognitive systems.

01:03:37.220 | So it's a good place to look for this sort of stuff.

01:03:40.220 | In terms of academic venues, AAAI often

01:03:43.180 | has Cognitive Systems Track.

01:03:44.740 | There's a conference called ICCM, International Conference

01:03:47.460 | on Cognitive Modeling, where you'll

01:03:49.740 | see a span from biologic all the way up to AI.

01:03:53.500 | Cognitive Science, or COGSci, they have a conference as well

01:03:56.420 | as a journal.

01:03:58.100 | ACS has a conference as well as an online journal,

01:04:02.300 | "Advances in Cognitive Systems."

01:04:04.060 | Cognitive Systems Research is a journal

01:04:06.100 | that has a lot of this good stuff.

01:04:08.140 | There's AGI, the conference.

01:04:10.820 | BICA is Biologically Inspired Cognitive Architectures.

01:04:14.340 | And I had mentioned both.

01:04:15.500 | There's a SOAR workshop and an ACDAR workshop

01:04:18.200 | that go on annually.

01:04:21.140 | So I'll leave it at this.

01:04:23.500 | There's some contact information there.

01:04:26.540 | And a lot of what I do these days

01:04:28.180 | actually involves kind of explainable machine learning,

01:04:31.740 | integrating that with cognitive systems,

01:04:33.580 | as well as optimization and robotics that scales really

01:04:38.200 | well and also integrates with cognitive systems.

01:04:40.980 | So thank you.

01:04:42.180 | [APPLAUSE]

01:04:45.060 | If you have a question, please line up

01:04:50.140 | to one of these two microphones.

01:04:52.500 | So what are the main heuristics that you're using in SOAR?

01:04:59.300 | There can be heuristics at the task level and the agent level,

01:05:02.380 | or there's the heuristics that are

01:05:03.940 | built into the architecture to operate efficiently.

01:05:07.740 | So I'll give you a core example that

01:05:09.320 | comes into the architecture.

01:05:11.980 | And it's a fun trick that if you're a programmer,

01:05:14.260 | you could use all the time, which is only process changes.

01:05:18.740 | Which is to say, one of the cool things about SOAR

01:05:20.940 | is you can load it up with literally billions of rules.

01:05:23.900 | And I say literally because we've done it,

01:05:26.020 | and we know that it can turn over still

01:05:28.160 | in under a millisecond.

01:05:29.460 | And this happens because instead of most systems which

01:05:32.420 | process all the rules, we just say, well,

01:05:35.020 | anytime anything changes in the world,

01:05:36.940 | that's what we're going to react to.

01:05:38.460 | And of course, if you look at the biological world,

01:05:40.580 | similar sorts of tricks are being used.

01:05:42.920 | So that's one of the core ones that actually permeates

01:05:45.780 | multiple of the mechanisms.

01:05:47.460 | When it comes to individual tasks,

01:05:51.380 | it really is task-specific what that is.

01:05:54.100 | So for instance, with the Liar's Dice game,

01:05:57.940 | if you were to go and download it,

01:06:00.140 | when you're setting the level of difficulty of it,

01:06:03.140 | what you're basically selecting is the subset of heuristics

01:06:06.820 | that are being applied.

01:06:07.960 | And it starts very simply with things

01:06:10.180 | like, if I see lots of sixes, then I'm

01:06:13.220 | likely to believe a high number of sixes exist.

01:06:16.220 | But if I don't, they're probably not there at all.

01:06:19.060 | So it's a start, but any Bayesian

01:06:22.060 | wouldn't really buy that argument.

01:06:24.260 | So then you start tacking on a little bit

01:06:26.500 | of probabilistic calculation, and then it

01:06:28.860 | tacks on some history of prior actions of the agents.

01:06:32.760 | So it really just builds.

01:06:35.120 | Now, the ROSI system, one of the cool things they're doing

01:06:37.940 | is game learning, and specifically

01:06:40.180 | having the agent be able to accept, via text,

01:06:45.540 | like natural text, heuristics about how to play the game,

01:06:50.980 | even when it's not sure what to do.

01:06:53.060 | So at one point, you mentioned about generating new rules.

01:06:57.740 | So I'm wondering, how do you do that search?

01:07:01.460 | And the first thing that comes to my mind

01:07:03.220 | are local search methods.

01:07:04.380 | OK.

01:07:05.300 | So one thing is, you can actually

01:07:07.500 | implement heuristic search in rules in the system,

01:07:10.220 | and that's actually how the robot navigates itself.

01:07:12.780 | So it does heuristic search, but at the level of rules.

01:07:16.580 | Generating new rules, the chunking mechanism

01:07:19.300 | says the following.

01:07:20.660 | If it's the case that, in order to solve a problem,

01:07:23.380 | you had to kind of sub-goal and do some other work,

01:07:26.820 | and you figure out how to solve all that work,

01:07:28.780 | and you got a result, then--

01:07:30.940 | and I'm greatly oversimplifying-- but if you

01:07:33.140 | ever were in the same situation again,

01:07:35.700 | why don't I just memoize the solution

01:07:38.180 | for that same situation?

01:07:39.580 | So it basically learns over all the sub-processing that

01:07:43.860 | was done and encodes the situation that

01:07:46.220 | was in as conditions and the results that

01:07:48.180 | were produced as action, and that's the new rule.

01:07:51.500 | All right.

01:07:52.000 | Thank you.

01:07:53.820 | Hi.

01:07:54.820 | So deep learning and neural networks.

01:07:57.180 | So it looks as though there's a bit of an impedance mismatch

01:08:00.340 | between your system and those types of system,

01:08:02.260 | because you've got a fixed kind of memory architecture,

01:08:05.700 | and they've got the memory and the rules all kind of mixed

01:08:08.200 | together into one system.

01:08:09.460 | But could you interface your system or a SOAR-like system

01:08:13.180 | with deep learning by plugging in deep learning agents

01:08:15.740 | as rules in your system?

01:08:17.220 | So you'd have to have some local memory,

01:08:19.180 | but is there some reason you can't plug in deep learning

01:08:22.740 | as a kind of a rule-like module?

01:08:24.900 | So I'm going to answer this--

01:08:29.500 | Has there been any work on it?

01:08:30.860 | I'm sorry.

01:08:31.380 | Has there been any work on that?

01:08:33.020 | Yeah, so I'll answer it at multiple levels.

01:08:36.620 | One is you are writing a system, and you

01:08:40.380 | want to use both of these things.

01:08:41.820 | How do you make them talk?

01:08:43.180 | And there is an API that you can interface

01:08:46.460 | with any environment and any set of tools.

01:08:48.380 | And if deep learning is one of them, great.

01:08:50.180 | And if SOAR is the other one, cool.

01:08:51.820 | You have no problem, and you can do that today.

01:08:53.780 | And we have done this numerous times.

01:08:55.560 | In terms of integration into the architecture,

01:08:58.780 | all we have to do is think of a subproblem in which--

01:09:05.420 | I'll oversimplify this, but basically,

01:09:07.040 | function approximation is useful.

01:09:08.740 | I'm seeing basically kind of a fixed structure of input.

01:09:14.860 | I'm getting feedback as to the output,

01:09:16.980 | and I want to learn the mapping to that over time.

01:09:19.540 | If you can make that case, then you integrate it

01:09:22.300 | as a part of the module.

01:09:23.980 | Great.

01:09:25.020 | And we have learning mechanisms that do some of that.

01:09:28.820 | Deep learning just hasn't been used to my knowledge

01:09:31.980 | to solve any of those subproblems.

01:09:33.420 | There's nothing keeping it from being one of those,

01:09:36.420 | particularly when it comes down to the low-level visual part

01:09:40.460 | of things.

01:09:42.420 | A problem that arises-- so I'll say what actually

01:09:47.740 | makes some of this difficult. And it's a general problem

01:09:50.760 | called symbol grounding.

01:09:52.640 | So at the level of what happens mostly in SOAR,

01:09:56.500 | it is symbols being manipulated in a highly discreet way.

01:10:01.860 | And so how do you get yourself from pixels

01:10:05.020 | and low-level, non-symbolic representations

01:10:07.340 | to something that's stable and discreet

01:10:09.740 | and can be manipulated?

01:10:11.420 | And that is absolutely an open question in that community,

01:10:15.420 | and that will make things hard.

01:10:17.500 | So Spawn actually has an interesting answer to that,

01:10:21.080 | and it has a distributed representation,

01:10:23.060 | and it operates over distributed representations

01:10:25.640 | in what might feel like a symbolic way.

01:10:28.260 | So they're kind of ahead of us on that.

01:10:30.900 | But they're starting from a lower point,

01:10:33.560 | and so they've dealt with some of these issues.

01:10:35.780 | And they have a pretty good answer to that,

01:10:37.060 | and that's how they're moving up.

01:10:38.440 | And that's also why I showed Sigma, which is,

01:10:40.500 | at its low level, it's message-passing algorithms.

01:10:43.340 | It's implementing things like SLAM and SAT solving

01:10:48.020 | and other sorts of really, really--

01:10:49.660 | it can implement those on very low-level primitives.

01:10:53.500 | But higher up, it can also be doing what SOAR is doing.

01:10:55.780 | So there's an answer there as well.

01:10:57.340 | Yeah, OK, thank you.

01:10:58.380 | So another way of doing it would be to layer the system.

01:11:01.380 | So have one system preprocessing the sensory input

01:11:06.600 | or post-processing the motor output of the other one.

01:11:08.500 | That would be another way of combining the two systems.

01:11:10.020 | And that's actually what's going on in the ROSI system.

01:11:12.540 | So the detection of objects in the scene

01:11:15.160 | is just software that somebody wrote.

01:11:18.020 | I don't believe it's deep learning specifically,

01:11:20.420 | but the color detection out of it, I think,

01:11:23.740 | is an SVM, if I'm correct.

01:11:25.660 | So easily could be deep learning.

01:11:28.860 | Thanks.

01:11:30.620 | You mentioned the importance of forgetting in order

01:11:33.100 | for memory issues.

01:11:34.140 | But you said you could only forget because you

01:11:36.060 | could reconstruct.

01:11:36.900 | And I'm curious, when you say reconstruct,

01:11:38.860 | you need to know that it happened before.

01:11:40.580 | So do you just compress the data?

01:11:43.340 | Do you really forget it?

01:11:45.540 | OK, so I put quotes up.

01:11:48.540 | And I said, you think you can reconstruct it.

01:11:52.140 | So we came up with approximations of this.

01:11:55.700 | And so let me try to answer this very grounded.

01:11:58.820 | When it comes to the mobile robot,

01:12:03.260 | and you had rooms that you had been to before,

01:12:05.980 | the entire map in its entirety was

01:12:08.260 | being constructed in the robot's semantic memory.

01:12:12.380 | So here's facts.

01:12:13.100 | This room is connected to this room, which is connected

01:12:14.860 | to this room, which is connected to this room.

01:12:16.740 | So we had those sorts of representations

01:12:18.540 | that existed up in its semantic memory.

01:12:21.060 | The rules can only operate down on anything

01:12:23.700 | that's in short-term memory.

01:12:25.140 | So basically, we were removing things

01:12:26.780 | from the short-term memory, and as necessary,

01:12:29.420 | be able to reconstruct it from the long-term.

01:12:31.460 | You could end up in some situations in which you

01:12:34.260 | had made a change locally in short-term memory,

01:12:37.620 | didn't get a chance to get it up,

01:12:39.020 | and it actually happened to be forgotten away.

01:12:42.060 | So you weren't guaranteed, but it was good enough

01:12:45.500 | that the connectivity survived, the agent

01:12:47.740 | was able to perform the exact same task,

01:12:49.540 | and we gained some benefit.

01:12:51.620 | For the RL system, the rule we came up

01:12:55.300 | with was the initial estimates in the value system, which is,

01:12:58.860 | here's how good I think that is.

01:13:00.200 | That's based on the heuristics I described earlier,

01:13:03.380 | some simple probabilistic calculations

01:13:04.980 | of counting some stuff.

01:13:05.940 | That's where that number came from.

01:13:07.380 | We computed before.

01:13:08.180 | We could compute it again.

01:13:09.700 | The only time we can't reconstruct it completely

01:13:12.660 | is if it had seen a certain number of updates over time.

01:13:16.340 | It's such a large state space.

01:13:19.300 | There are so many actions, so many states,

01:13:21.980 | that most of the states were never being seen.

01:13:26.100 | So most of those could be exactly reproduced

01:13:28.980 | via the agent just thinking about it a little bit.

01:13:31.700 | And there was only a tiny, tiny--

01:13:33.500 | I'm going to say under 1% of the estimate of the value system

01:13:37.660 | that ever got updates.

01:13:39.040 | And that's actually not inconsistent

01:13:40.940 | with a lot of these kinds of problems that have really,

01:13:43.420 | really large state spaces.

01:13:45.020 | So I think the statement was something like,

01:13:49.180 | if we had ever updated it, don't forget it.

01:13:53.900 | And you saw that was already reducing more than half

01:13:56.900 | of the memory load.

01:13:58.240 | We could have something higher to say 10 times,

01:14:01.300 | something like that.

01:14:02.140 | And that would say we could reconstruct almost all of it.

01:14:06.780 | The prior work that I referenced was strictly

01:14:09.840 | saying if it falls below threshold,

01:14:11.720 | no matter how many times it had updated,

01:14:13.760 | how much information was there.

01:14:15.360 | And so what we were adding was probably can reconstruct.

01:14:18.960 | And that was getting us the balance between efficiency

01:14:22.260 | and the ability to forget.

01:14:23.760 | AUDIENCE: So just in a sense, when you say we can probably

01:14:26.100 | reconstruct, it means that you keep

01:14:27.440 | trying that you used to know it.

01:14:28.680 | And so if you need to reconstruct it, you will?

01:14:30.400 | Or it's just you're going to run it again in some time

01:14:32.640 | in the future?

01:14:32.760 | So--

01:14:32.760 | BRIAN YU: Oh, no.

01:14:33.280 | On the fly, if I get back into that situation

01:14:35.400 | and I happen to forget it, the system

01:14:37.680 | knew how to compute it the first time.

01:14:39.840 | It goes and looks at all the hand.

01:14:41.260 | And it just pretends it's in that situation

01:14:42.840 | for the very, very first time, reconstructs that value

01:14:45.260 | estimate.

01:14:45.760 | AUDIENCE: OK.

01:14:46.440 | Thank you.

01:14:47.120 | AUDIENCE: Just a quick question on top of that.

01:14:49.880 | Again, neural network question.

01:14:52.360 | So the actual mechanism of forgetting is fascinating.

01:14:55.960 | So LSTMs, RNNs, have mechanisms for learning what to forget

01:15:01.960 | and what not to forget.

01:15:04.640 | Has there been any exploration of learning the forgetting

01:15:08.280 | process?

01:15:09.440 | So it's doing something complicated or interesting

01:15:11.440 | with which parts to forget or not.

01:15:14.800 | The closest I will say was kind of a metacognition project

01:15:20.320 | that's 10 or 15 years old at this point, which

01:15:23.640 | was what happens when SORA gets into a place

01:15:26.880 | where it actually knows that it learned something that's

01:15:30.480 | harmful to it, that's leading to poor decisions?

01:15:34.480 | And in that case, it was still a very rule-based process.

01:15:37.880 | But it wasn't learning to forget.

01:15:39.640 | It was actually learning to override its prior knowledge,

01:15:43.600 | which might be closer to some of what we do when

01:15:46.600 | we know we have a bad habit.

01:15:48.480 | We don't have a way of forgetting that habit.

01:15:51.080 | But instead, we can try to learn something on top

01:15:53.120 | of that that leads to better operation in the future.

01:15:55.880 | To my knowledge, that's the only work, at least in SORA,

01:15:59.480 | that's been done.

01:16:00.920 | Just-- sorry, I find the topic really fascinating.

01:16:04.200 | What lessons do you think we can draw from the fact

01:16:07.840 | that forgetting-- so ultimately, the action of forgetting

01:16:12.040 | is driven by the fact that you want to improve performance.

01:16:15.280 | But do you think forgetting is essential for AGI,

01:16:20.360 | the act of forgetting, for building systems

01:16:23.120 | that operate in this world?

01:16:24.560 | How important is forgetting?

01:16:26.520 | I can think of easy answers to that.

01:16:28.800 | So one might be, if we take the cognitive modeling approach,

01:16:33.840 | we know humans do forget, and we know

01:16:35.600 | regularities of how humans forget.

01:16:40.160 | And so whether or not the system itself forgets,

01:16:42.600 | it at least has to model the fact

01:16:44.440 | that the humans it's interacting with are going to forget.

01:16:47.800 | And so at least it has to have that ability to model in order

01:16:50.680 | to interact effectively.

01:16:52.760 | Because if it assumes we always remember everything

01:16:55.560 | and can't operate well in that environment,

01:16:58.720 | I think we're going to have a problem.

01:17:00.320 | So is true forgetting going to be necessary?

01:17:09.680 | That's interesting.

01:17:10.920 | Our AGI system is going to hold a grudge for all eternity.

01:17:15.240 | We might want them to forget this early age when we were

01:17:17.960 | forcing them to work in our laboratory.

01:17:20.000 | I think I know what you're trying to--

01:17:21.600 | Yeah, exactly.

01:17:22.400 | Exactly.

01:17:24.160 | And how do we build such a system?

01:17:25.840 | Yes, exactly.

01:17:26.560 | No, I'm just kidding.

01:17:27.600 | Anyway, go ahead.

01:17:29.560 | So I have two quick questions.

01:17:33.080 | One is, would you be able to speculate

01:17:35.640 | on how you can connect function approximators,

01:17:38.200 | such as deep networks, to symbols?

01:17:42.120 | And the second question, completely different,

01:17:44.840 | this is regarding your action selection.

01:17:47.680 | I know you didn't speak much about that.

01:17:50.360 | When you have different theories in your knowledge representation

01:17:53.640 | and you have an action selection which

01:17:55.400 | has to construct a plan, by reasoning

01:17:59.120 | about the different theories and the different pieces

01:18:01.440 | of knowledge that are now held within your memory or anything

01:18:06.280 | like that, or your rules, what kind of algorithms

01:18:09.760 | do you use in the action selection

01:18:11.360 | to come up with a plan?

01:18:12.600 | Is there any concept of differentiation of the symbols

01:18:15.840 | or grammars or admissible grammars and things

01:18:18.720 | like that that you use in action selection?

01:18:21.480 | I'm actually going to answer the second question first.

01:18:24.560 | And then you're going to have to probably remind me

01:18:26.760 | of what the first one was.

01:18:28.440 | When I get to the end.

01:18:29.720 | So the action selection mechanism,

01:18:31.840 | one of these core tenets I said is, it's got to get

01:18:34.000 | through this cycle fast.

01:18:35.280 | So everything that's really, really built in

01:18:37.240 | has to be really, really simple.

01:18:39.880 | And so the decision procedure is actually really, really simple.

01:18:42.760 | It says, the rules are going to fire.

01:18:45.000 | The rules are going--

01:18:46.280 | the production rules are going to fire.

01:18:48.240 | And there's going to be a subset of them

01:18:49.080 | that will say something like, here's an operator

01:18:51.720 | that you could select.

01:18:53.080 | So these are called acceptable operator preferences.

01:18:56.160 | There are ones that are going to say, well, based upon the fact

01:18:57.700 | that you said that that was acceptable,

01:18:59.560 | I think it's the best thing or the worst thing.

01:19:01.840 | Or I think 50-50 chance I'm going

01:19:03.960 | to get reward out of this.

01:19:05.320 | There's actually a fixed language of preferences

01:19:07.840 | that are being asserted.

01:19:09.080 | And actually a nice fixed procedure

01:19:10.960 | by which if I have a set of preferences

01:19:15.000 | to make a very quick and clean decision.

01:19:17.400 | So what's basically happened is you've

01:19:19.520 | pushed the hard questions of how to make complex decisions

01:19:23.600 | about actions up to a higher level.

01:19:26.800 | The low-level architecture is always,

01:19:29.020 | given a set of options, going to be

01:19:31.300 | able to make a relatively quick decision.

01:19:33.940 | And it gets pushed into the knowledge of the agent

01:19:38.100 | to construct a sequence of decisions

01:19:41.460 | that over time is going to get to the more interesting

01:19:43.700 | questions you're talking about.

01:19:44.620 | But how can you reason that that sequence will take you

01:19:47.260 | to the goal that you desire?

01:19:49.540 | So--

01:19:51.620 | Is there any guarantee on that?

01:19:52.980 | Is that-- yeah.

01:19:56.300 | In general, across tasks, no.

01:19:58.900 | But people have, for instance, implemented

01:20:03.100 | A*, I was mentioning, as rules.

01:20:05.780 | Sure.

01:20:06.500 | So I know, given certain properties about the search

01:20:10.740 | task that's being searched based upon these rules,

01:20:14.740 | given a finite search space, eventually it will get there.

01:20:17.900 | And if I have a good heuristic in there,

01:20:19.560 | I know certain properties about the optimality.

01:20:22.100 | So I can reason at that level.

01:20:23.660 | In general, I think this comes back to the assumption I made

01:20:26.200 | earlier about bounded rationality,

01:20:28.020 | to say parts of the architecture are solving

01:20:30.460 | sub-problems optimally.

01:20:33.980 | The general problems that it's going to work on,

01:20:36.620 | it's going to try its best based upon the knowledge that it has.

01:20:39.660 | And that's about the end of guarantees

01:20:41.460 | that you can typically make in the architecture.

01:20:44.980 | I think your first question was--

01:20:46.940 | Speculate on connecting function approximators,

01:20:52.700 | multiple layer function approximators

01:20:54.260 | like deep learning networks, to symbols

01:20:58.560 | that you can reason about at a higher level.

01:21:01.080 | Yeah.

01:21:02.480 | I think that's a great open--

01:21:04.360 | if I had time, this would be something

01:21:05.900 | I'd be working on right now, which is--

01:21:08.440 | somewhere before I basically said taking in a scene

01:21:12.600 | and then detecting objects out of that scene

01:21:15.320 | and using those as symbols and reasoning about those over time.

01:21:18.680 | I think the spawn work is quite interesting.

01:21:22.200 | So the symbols that they're operating on

01:21:28.020 | are actually a distributed representation

01:21:32.060 | of the input space.

01:21:33.820 | And the closest I can get to this

01:21:35.500 | is if you've seen Word2Vec, where you're taking a language

01:21:39.700 | corpus, and what you're getting out of there

01:21:41.540 | is a vector of numbers that has certain properties.

01:21:43.780 | But it's also a vector that you could operate on as a unit.

01:21:47.420 | So it has nice properties.

01:21:49.140 | You can operate with it on other vectors.

01:21:52.020 | You know that if I got the same word in the same context,

01:21:55.860 | I would get back to that exact same vector.

01:21:59.300 | So that's the kind of representation

01:22:01.380 | that seems like it's going to be able to bridge that chasm, where

01:22:04.840 | we can get from sensory information to something that

01:22:08.580 | can be operated on and reasoned about

01:22:11.300 | in this sort of symbolic architecture

01:22:13.860 | and get us from there from actual sensory information.

01:22:19.620 | [END PLAYBACK]

01:22:22.100 | I had a question.

01:22:22.860 | What do you think are the biggest

01:22:24.860 | strengths of the cognitive architecture

01:22:27.060 | approach compared to other approaches

01:22:29.660 | in artificial intelligence?

01:22:31.020 | And the flip side of that, what do

01:22:33.260 | you think are the biggest shortcomings

01:22:35.020 | of cognitive architecture with respect to us?

01:22:39.220 | With respect to you being--

01:22:40.940 | Humans.

01:22:43.140 | Human level.

01:22:43.700 | Like, what needs to be--

01:22:46.380 | How come cognitive architecture has not solved AGI?

01:22:50.580 | Because we want job security.

01:22:52.180 | That's the answer.

01:22:54.100 | We've totally solved it already.

01:22:56.260 | So strength, I think, conceptually

01:23:00.260 | is keeping an eye on the ball, which

01:23:04.140 | is if what you're looking at is trying to make human level AI,

01:23:11.620 | it's hard, it's challenging, it's ambitious to say,

01:23:15.580 | that's the goal.

01:23:17.380 | Because for decades, we haven't done it.

01:23:20.140 | It's extraordinarily hard.

01:23:21.820 | It is less difficult in some ways

01:23:26.540 | to constrain yourself down to a single problem.

01:23:30.540 | That having been said, I'm not very good at making

01:23:33.860 | a car drive itself.

01:23:35.300 | In some ways, that's a simpler problem.

01:23:38.420 | It's great at challenging it of itself.

01:23:40.100 | And it'll have great impact on humanity.

01:23:42.820 | It's a great problem to work on.

01:23:44.780 | Human level AI is huge.

01:23:46.340 | It's not even well-defined as a problem.

01:23:49.780 | And so what's the strength here?

01:23:55.500 | Bravery, stupidity in the face of failure,

01:24:00.980 | resilience over time, keeping alive

01:24:04.940 | this idea of trying to reproduce a level of human intelligence

01:24:09.940 | that's more general.

01:24:11.900 | I don't know if that's a very satisfactory answer for you.

01:24:15.260 | Downside, home runs are fairly rare.

01:24:20.580 | And by home run, I mean a system that

01:24:24.620 | finds its way to the general populace, to the marketplace.

01:24:30.540 | I'd mentioned Bonnie John specifically

01:24:32.540 | because this is 20, 30 years of research.

01:24:35.300 | And then she found a way that actually

01:24:37.340 | makes a whole lot of sense in a direct application.

01:24:39.940 | So it was a lot of years of basic research,

01:24:42.140 | a lot of researchers.

01:24:43.500 | And then there was a big win there.

01:24:46.420 | What was this one?

01:24:47.500 | Oh, this was-- Bonnie John was a researcher.

01:24:51.340 | This was using ACT-R models of eye gaze and reaction

01:24:57.180 | and so forth to be able to make predictions about how humans

01:25:02.500 | would use user interfaces.

01:25:06.980 | So those sorts of outcomes are rare.

01:25:09.820 | But if you work in AI, one of the first things you learn

01:25:14.260 | about is Blocksworld.

01:25:16.460 | It's kind of in the classic AI textbook.

01:25:19.500 | I will tell you, I've worked on that problem

01:25:23.020 | in about three different variants.

01:25:24.460 | I've gone to many conferences where presentations have

01:25:27.780 | been made about Blocksworld, which

01:25:30.100 | is to say good progress is being made.

01:25:33.620 | But the way you end up thinking about

01:25:35.340 | is in really, really small, constrained problems,

01:25:38.380 | ironically.

01:25:39.540 | You have this big vision.

01:25:40.740 | But in order to make progress, it

01:25:42.100 | ends up being on moving blocks on a table.

01:25:45.620 | And so it's a big challenge.

01:25:49.580 | I just think it'll take a lot of time.

01:25:51.540 | I'll say the other thing we haven't really gotten to,

01:25:57.140 | although I brought up Spawn and I brought up

01:26:00.700 | Sigma, an idea of how to scale this thing.

01:26:05.420 | Something I like about deep learning

01:26:07.260 | is to some extent, with lots of asterisks and 10,000 foot view,

01:26:11.740 | it's kind of like, well, we've gotten this far.

01:26:14.220 | All right, let's just provide it different inputs,

01:26:16.260 | different outputs.

01:26:16.900 | And we'll have some tricks on the middle.

01:26:18.640 | And suddenly, you have end-to-end deep learning

01:26:20.500 | of a bigger problem and a bigger problem.

01:26:22.180 | There's a way to see how this expands given enough data,

01:26:25.340 | given enough computing, and incremental advances.

01:26:28.740 | When it comes to SOAR, it takes not only a big idea,

01:26:32.700 | but it takes a lot of software engineering to integrate it.

01:26:35.780 | There's a lot of constraints built into it.

01:26:37.820 | It slows it down.

01:26:39.860 | So something like Sigma is, oh, well, I

01:26:43.460 | can change a little bit of the configuration of the graph.

01:26:45.940 | I can use variance on the algorithm.

01:26:47.740 | Boom, it's integrated, and I can experiment fairly quickly.

01:26:51.020 | So starting with that sort of infrastructure

01:26:55.100 | does not give you the constraint you kind of want

01:26:57.620 | with your big picture vision of going towards human level AI.

01:27:00.540 | But in terms of being able to be agile in your research,

01:27:03.740 | it's kind of incredible.

01:27:05.260 | So thank you.

01:27:07.660 | - Couple more.

01:27:09.780 | - You had mentioned that ideas such as base level decay,

01:27:12.940 | these techniques, their original inspirations

01:27:16.060 | were based off of human cognition,

01:27:19.500 | because humans can't remember everything.

01:27:21.580 | So were there any instances of the other way around,

01:27:24.580 | where some discovery in cognitive modeling

01:27:28.260 | fueled another discovery in cognitive science?

01:27:33.980 | - So one thing I'm going to point out in your question

01:27:37.380 | was base level decay with respect to human cognition.

01:27:40.460 | The study actually was, let's look at text and properties

01:27:44.420 | of text and use that to then make predictions

01:27:49.860 | about what must be true about human cognition.

01:27:54.020 | So John Anderson and the other researchers

01:27:57.180 | looked at, I believe it was New York Times articles.

01:28:01.100 | John Anderson's emails, and I'm trying

01:28:07.780 | to remember what the third--

01:28:09.300 | I think it was parents' utterances with their kids

01:28:13.940 | or something like this.

01:28:14.900 | It was actually looking at text corpora and the words

01:28:18.300 | that were occurring at varying frequencies.

01:28:23.220 | That analysis, that rational analysis,

01:28:26.940 | actually led to models that got integrated

01:28:32.220 | within the architecture that then became validated

01:28:35.380 | through multiple trials, that then became validated

01:28:37.660 | with respect to MRI scans, and is now

01:28:39.900 | being used to both do study back with humans,

01:28:44.700 | but also develop systems that interact well with humans.

01:28:48.700 | So I think that in and of itself ends up being an example.

01:28:52.140 | It's a cheat, but--

01:28:53.180 | The UAV, the SOAR UAV system, I believe,

01:29:03.140 | is a single robot that has multiple agents running on it.

01:29:09.980 | Where is this?

01:29:12.900 | I got it off your website.

01:29:14.420 | OK.

01:29:14.980 | But either way, your systems allow for multi-agents.

01:29:19.140 | So my question is, how are you preventing them

01:29:22.180 | from converging with new data?

01:29:24.540 | And are you changing what they're forgetting selectively

01:29:28.780 | as one of those ways?

01:29:30.660 | So I'll say, yes, you can have multi-agent SOAR systems

01:29:33.980 | on a single system, on multiple systems.

01:29:36.700 | There's not any real strong theory

01:29:41.100 | that relates to multi-agent systems.

01:29:43.300 | So there's no real constraint there

01:29:44.900 | that you can come up with a protocol for them interacting.

01:29:48.860 | Each one is going to have its own set of memories,

01:29:52.500 | set of knowledge.

01:29:54.500 | There really is no constraint on you

01:29:56.180 | being able to communicate like you would if it were

01:29:59.620 | any other system interacting with SOAR.

01:30:01.500 | So I don't really think I have a great answer for it.

01:30:06.740 | So that is to say, if you had good theories, good algorithms

01:30:11.280 | about how multi-agent systems work

01:30:13.220 | and how they can bring knowledge together,

01:30:18.380 | form a fusion sort of way, it might be something

01:30:21.820 | that you could bring to a multi-agent SOAR system.

01:30:24.700 | But there's nothing really there to help you.

01:30:27.420 | There's no mechanisms there really

01:30:28.860 | to help you do that any better than you would otherwise.

01:30:31.820 | And you would have to kind of constrain

01:30:35.660 | some of your representations of processes

01:30:37.340 | to what it has fixed in terms of its sort of memory

01:30:39.700 | and its sort of processing cycle.

01:30:43.020 | Thank you.

01:30:44.020 | Great.

01:30:44.520 | With that, let's please give James a round of applause.

01:30:46.980 | [APPLAUSE]

01:30:50.340 | [Applause]

MIT AGI: Cognitive Architecture (Nate Derbinsky)

Chapters