back to index

MIT AGI: Cognitive Architecture (Nate Derbinsky)


Chapters

0:0 Intro
0:43 Outline
2:40 Expectations Meet (Current) Reality
3:52 Common Motivations
5:25 Motivations/Questions Dictate Approach
10:33 Unified Theories of Cognition
12:7 Making (Scientific) Progress Lakatos 1970
13:43 Time Scales of Human Action Newell 1990
15:32 Bounded Rationality Simon 1967
17:36 Physical Symbol System Hypothesis Newal & Simon 1976
19:36 Active Architectures by Focus
21:26 Semantic Pointer Architecture Unified Network
23:32 Prototypical Architecture
29:10 ACT-R Notes
33:58 Soar Notes
37:47 Luminal ADAM Lab @ GATech
40:54 Rosie
45:52 Soar 9 [Laird 2012] Memory Integration
47:53 One Research Path
48:54 Problem ala Word-Sense Disambiguation
50:23 WSD Evaluation Historical Memory Retrieval Blas
50:58 Efficiency
52:30 Related Problem: Memory Size
54:33 Efficient Implementation
54:39 Approximation Quality
54:55 Prediction Complexity
55:2 Prediction Computation
55:7 Task #1: Mobile Robotics
55:52 Results: Decision Time
56:29 Task #2: Liar's Dice Michigan Liar's Dice
56:56 Reasoning -- Action Knowledge
57:23 Forgetting Action Knowledge
58:18 Results: Competence
59:23 Some CogArch Open Issues

Whisper Transcript | Transcript Only Page

00:00:00.000 | So today we have Nate Derbinski.
00:00:02.720 | He's a professor at Northeastern University
00:00:05.140 | working on various aspects of computational agents
00:00:08.720 | that exhibit human level intelligence.
00:00:11.520 | Please give Nate a warm welcome.
00:00:13.780 | (audience applauding)
00:00:16.940 | - Thanks a lot and thanks for having me here.
00:00:20.800 | So the title that was on the page was Cognitive Modeling.
00:00:25.680 | I'll kind of get there, but I wanted to put it in context.
00:00:28.200 | So the bigger theme here is I wanna talk about
00:00:31.160 | what's called cognitive architecture.
00:00:33.000 | And if you've never heard about that before, that's great.
00:00:35.800 | And I wanted to contextualize that as
00:00:37.760 | how is that one approach to get us to AGI?
00:00:41.900 | I'm gonna say what my view of AGI is
00:00:46.400 | and put up a whole bunch of TV and movie characters
00:00:50.120 | that I grew up with that inspire me.
00:00:52.540 | That'll lead us into
00:00:53.380 | what is this thing called cognitive architecture?
00:00:55.320 | It's a whole research field that crosses neuroscience,
00:00:58.760 | psychology, cognitive science, and all the way into AI.
00:01:02.360 | So I'll try to give you kind of the historical,
00:01:04.340 | big picture view of it,
00:01:05.920 | what some of the actual systems are out there
00:01:08.520 | that might be of interest to you.
00:01:10.000 | And then we'll kind of zoom in on one of them
00:01:11.820 | that I've done a good amount of work with called SOAR.
00:01:14.640 | And what I'll try to do is tell a story,
00:01:17.600 | a research story of how we started
00:01:20.220 | with kind of a core research question.
00:01:22.700 | We'd look to how humans operate,
00:01:27.040 | understood that phenomenon,
00:01:28.600 | and then took it and saw really interesting results from it.
00:01:31.400 | And so at the end, if this field is of interest,
00:01:34.240 | there's a few pointers for you to go read more
00:01:37.100 | and go experience more of cognitive architecture.
00:01:39.540 | So just rough definition of AGI, given this is an AGI class.
00:01:45.600 | Depending the direction that you're coming from,
00:01:49.480 | it might be kind of understanding intelligence
00:01:51.720 | or it might be developing intelligence systems
00:01:54.680 | that are operating at the level of human level intelligence.
00:01:57.520 | The typical differences between this
00:02:00.680 | and other sorts of maybe AI machine learning systems,
00:02:03.760 | we want systems that are gonna persist
00:02:06.000 | for a long period of time.
00:02:07.960 | We want them robust to different conditions.
00:02:10.760 | We want them learning over time.
00:02:12.560 | And here's the crux of it, working on different tasks.
00:02:16.060 | And in a lot of cases,
00:02:17.520 | tasks they didn't know were coming ahead of time.
00:02:21.560 | I got into this because I clearly watched too much TV
00:02:25.160 | and too many movies.
00:02:26.720 | And then I looked back at this and I realized,
00:02:28.400 | I think I'm covering 70s, 80s, 90s,
00:02:32.640 | noughts, I guess it is, and today.
00:02:35.220 | And so this is what I wanted out of AI
00:02:38.760 | and this is what I wanted to work with.
00:02:41.280 | And then there's the reality that we have today.
00:02:44.800 | So instead of, so who's watched "Knight Rider" for instance?
00:02:51.000 | I don't think that exists yet,
00:02:53.860 | but maybe we're getting there.
00:02:56.560 | And in particular, for fun, during the Amazon sale day,
00:03:01.260 | I got myself an Alexa
00:03:03.200 | and I could just see myself at some point saying,
00:03:05.280 | hey Alexa, please write me an R-sync script
00:03:07.880 | to sync my class.
00:03:10.560 | And if you have an Alexa,
00:03:11.720 | you probably know the following phrase.
00:03:13.400 | This just always hurts me inside, which is,
00:03:16.240 | sorry, I don't know that one.
00:03:19.520 | Which is okay, right?
00:03:20.440 | That's, a lot of people have no idea what I'm asking,
00:03:23.440 | let alone how to do that.
00:03:24.820 | So what I want Alexa to respond with after that is,
00:03:28.720 | do you have time to teach me?
00:03:30.160 | And to provide some sort of interface
00:03:33.300 | by which back and forth we can kind of talk through this.
00:03:36.280 | We aren't there yet, to say the least,
00:03:39.440 | but I'll talk later about some work on a system called Rosie
00:03:44.440 | that's working in that direction.
00:03:46.460 | We're starting to see some ideas
00:03:48.480 | about being able to teach systems how to work.
00:03:51.200 | So folks who are in this field,
00:03:54.600 | I think generally fall into these three categories.
00:03:58.760 | They're just curious.
00:03:59.880 | They want to learn new things,
00:04:01.560 | generate knowledge, work on hard problems.
00:04:04.160 | Great.
00:04:05.120 | I think there are folks who are in
00:04:06.960 | kind of that middle cognitive modeling realm.
00:04:09.800 | And so I'll use this term a lot.
00:04:11.960 | It's really understanding how humans think,
00:04:15.200 | how humans operate, human intelligence at multiple levels.
00:04:18.880 | And if you can do that,
00:04:20.680 | one, there's just knowledge in and of itself
00:04:22.920 | of how we operate,
00:04:23.880 | but there's a lot of really important applications
00:04:26.020 | that you can think of.
00:04:27.000 | If we were able to not only understand,
00:04:30.320 | but predict how humans would respond,
00:04:32.880 | react in various tasks.
00:04:35.280 | Medicine is an easy one.
00:04:37.960 | There's some work in HCI or HRI,
00:04:40.720 | I'll get to later,
00:04:42.620 | where if you can predict how humans would respond to a task,
00:04:46.080 | you can iterate tightly and develop better interfaces.
00:04:50.200 | It's already being used in the realm of simulation
00:04:52.520 | and in defense industries.
00:04:55.200 | I happen to fall into the latter group,
00:04:57.880 | or the bottom group,
00:04:58.840 | which is systems development,
00:05:00.640 | which is to say just the desire to build systems
00:05:02.960 | for various tasks that are working on tasks
00:05:05.600 | that kind of current AI machine learning can't operate on.
00:05:09.840 | And I think when you're working at this level
00:05:12.620 | or on any system that nobody's really achieved before,
00:05:16.660 | what do you do?
00:05:17.500 | You kind of look to the examples that you have,
00:05:19.300 | which in this case that we know of,
00:05:21.940 | it's just humans, right?
00:05:23.380 | Irrespective of your motivation,
00:05:27.880 | when you have kind of an intent
00:05:31.500 | that you want to achieve in your research,
00:05:33.280 | you kind of let that drive your approach.
00:05:35.740 | And so I often show my AI students this.
00:05:39.520 | The Turing test you might've heard of,
00:05:41.660 | or variants of it that have come before,
00:05:44.820 | these were folks who were trying to create systems
00:05:47.020 | that acted in a certain way,
00:05:48.340 | that acted intelligently.
00:05:49.900 | And the kind of line that they drew,
00:05:51.980 | the benchmark that they used was to say,
00:05:54.200 | let's make systems that operate like humans do.
00:05:57.000 | Cognitive modelers will fit up into this top point here
00:06:01.380 | to say it's not enough to act that way,
00:06:04.420 | but by some definition of thinking,
00:06:07.660 | we want the system to do what humans do,
00:06:11.080 | or at least be able to make predictions about it.
00:06:12.980 | So that might be things like,
00:06:14.560 | what errors would the human make on this task?
00:06:17.020 | Or how long would it take them to perform this task?
00:06:19.800 | Or what emotion would be produced in this task?
00:06:22.780 | There are folks who are still thinking about
00:06:26.240 | how the computer is operating,
00:06:28.100 | but trying to apply kind of rational rules to it.
00:06:32.600 | So a logician, for instance, would say,
00:06:35.440 | if you have A and you have B,
00:06:37.860 | A gives you B, B gives you C,
00:06:39.160 | A should definitely give you C.
00:06:41.320 | That's just what's rational.
00:06:42.580 | And so there are folks operating in that direction.
00:06:44.920 | And then if you go to intro AI class
00:06:47.780 | anywhere around the country,
00:06:48.700 | particularly Berkeley,
00:06:50.180 | because they have graphics designers
00:06:51.820 | that I get to steal from,
00:06:54.180 | the benchmark would be what the system produces
00:06:57.520 | in terms of action,
00:06:59.080 | and the benchmark is some sort of optimal rational bound.
00:07:04.940 | Irrespective of where you work in this space,
00:07:07.240 | there's kind of a common output that arrives
00:07:11.960 | when you research these areas,
00:07:13.480 | which is you can learn individual bits and pieces,
00:07:17.760 | and it can be hard to bring them together
00:07:20.560 | to build a system that either predicts
00:07:22.660 | or acts on different tasks.
00:07:24.960 | So this is part of the transfer learning problem,
00:07:27.040 | but it's also part of having distinct theories
00:07:30.840 | that are hard to combine together.
00:07:32.640 | So I'm gonna give an example
00:07:33.480 | that comes out of cognitive modeling,
00:07:35.860 | or perhaps three examples.
00:07:37.220 | So if you were in a HCI class
00:07:40.020 | or some intro psychology classes,
00:07:42.580 | one of the first things you learn about is Fitts' Law,
00:07:45.320 | which provides you the ability
00:07:46.900 | to predict the difficulty level
00:07:50.960 | of basically human pointing
00:07:53.120 | from where they start to a particular place.
00:07:55.760 | And it turns out that you can learn some parameters
00:07:58.720 | and model this based upon just the distance
00:08:02.020 | from where you are to the target
00:08:04.080 | and the size of the target.
00:08:06.040 | So both moving a long distance will take a while,
00:08:08.500 | but also if you're aiming for a very small point,
00:08:11.200 | that can take longer than if there's a large area
00:08:13.320 | that you just kind of have to get yourself to.
00:08:15.640 | And so this is held true for many humans.
00:08:19.100 | So let's say we've learned this,
00:08:21.080 | and then we move on to the next task,
00:08:22.880 | and we learn about what's called the power law of practice,
00:08:26.640 | which has been shown true in a number of different tasks.
00:08:30.680 | What I'm showing here is one of them,
00:08:32.000 | where you're going to draw a line
00:08:34.560 | through sequential set of circles here,
00:08:36.760 | starting at one, going to two, and so forth,
00:08:39.680 | not making a mistake, or at least not trying to,
00:08:42.520 | and try to do this as fast as possible.
00:08:44.600 | And so for a particular person,
00:08:47.440 | we would fit the A, B, and C parameters,
00:08:49.420 | and we'd see a power law.
00:08:50.740 | So as you perform this task more,
00:08:53.280 | you're gonna see a decrease in the amount
00:08:56.320 | of reaction time required to complete the task.
00:08:59.320 | Great, we've learned two things about humans.
00:09:02.060 | Let's add some more in.
00:09:03.220 | So for those who might have done
00:09:05.040 | some reinforcement learning,
00:09:05.940 | TD learning is one of those approaches,
00:09:08.260 | temporal difference learning,
00:09:09.860 | that's had some evidence of similar sorts of processes
00:09:14.860 | in the dopamine centers of the brain.
00:09:16.940 | And it basically says in a sequential learning task,
00:09:19.900 | you perform the task, you get some sort of reward.
00:09:23.140 | How are you going to kind of update your representation
00:09:25.780 | of what to do in the future,
00:09:26.800 | such as to maximize expectation of future reward?
00:09:29.940 | And there are various models of how that changes over time,
00:09:33.220 | and you can build up functions that allow you
00:09:35.460 | to perform better and better and better
00:09:36.820 | given trial and error.
00:09:38.620 | Great, so we've learned three interesting models here
00:09:43.580 | that hold true over multiple people, multiple tasks.
00:09:47.500 | And so my question is, if we take these together
00:09:50.260 | and add them together, how do we start to understand
00:09:55.260 | a task as quote unquote simple as chess?
00:09:58.340 | Which is to say, we could ask questions,
00:10:00.780 | how long would it take for a person to play?
00:10:04.740 | What mistakes would they make?
00:10:06.480 | After they played a few games,
00:10:08.380 | how would they adapt themselves?
00:10:10.460 | Or if we want to develop system
00:10:12.180 | that ended up being good at chess,
00:10:14.340 | or at least learning to become better at chess.
00:10:16.980 | My question is, if you could,
00:10:19.060 | there doesn't seem to be a clear way
00:10:21.460 | to take these very, very individual theories
00:10:24.460 | and kind of smash them together
00:10:26.040 | and get a reasonable answer of how to play chess,
00:10:29.340 | or how do humans play chess?
00:10:30.900 | And so, gentlemen in this slide is Alan Newell,
00:10:37.300 | one of the founders of AI,
00:10:40.540 | did incredible work in psychology and other fields.
00:10:43.740 | He gave a series of lectures at Harvard in 1987,
00:10:47.820 | and they were published in 1990
00:10:49.460 | called the Unified Theories of Cognition.
00:10:51.660 | And his argument to the psychology community at that point
00:10:54.660 | was the argument on the prior slide.
00:10:56.980 | They had many individual studies, many individual results.
00:11:00.940 | And so the question was, how do you bring them together
00:11:03.260 | to gain this overall theory?
00:11:05.020 | How do you make forward progress?
00:11:06.980 | And so his proposal was Unified Theories of Cognition,
00:11:10.420 | which became known as cognitive architecture.
00:11:13.980 | Which is to say, to bring together your core assumptions,
00:11:18.980 | your core beliefs of what are the fixed mechanisms
00:11:22.500 | and processes that intelligent agents
00:11:25.820 | would use across tasks.
00:11:27.740 | So the representations, the learning mechanisms,
00:11:30.660 | the memory systems, bring them together,
00:11:34.500 | implement them in a theory, and use that across tasks.
00:11:38.580 | And the core idea is that when you actually have
00:11:41.220 | to implement this and see how it's going to work
00:11:43.300 | across different tasks, the interconnections
00:11:45.900 | between these different processes and representations
00:11:49.500 | would add constraint.
00:11:51.420 | And over time, the constraints would start limiting
00:11:54.220 | the design space of what is necessary
00:11:56.940 | and what is possible in terms of building
00:11:58.420 | intelligent systems.
00:11:59.620 | And so the overall goal from there was to understand
00:12:02.500 | and exhibit human-level intelligence
00:12:04.640 | using these cognitive architectures.
00:12:06.440 | A natural question to ask is, okay, so we've gone from
00:12:13.420 | a methodology of science that we understand
00:12:17.180 | how to operate in.
00:12:19.180 | We make a hypothesis, we construct a study,
00:12:22.860 | we gather our data, we evaluate that data,
00:12:25.700 | and we falsify or we do not falsify the original hypothesis.
00:12:29.180 | And we can do that over and over again,
00:12:30.860 | and we know that we're making forward progress
00:12:32.620 | scientifically.
00:12:33.700 | If I've now taken that model and changed it into,
00:12:36.740 | I have a piece of software,
00:12:38.940 | and it's representing my theories.
00:12:41.820 | And to some extent, I can configure that software
00:12:43.900 | in different ways to work on different tasks.
00:12:46.260 | How do I know that I'm making progress?
00:12:48.700 | And so there's a form of science called lactosean,
00:12:53.700 | and it's kind of shown pictorially here
00:12:56.180 | where you start with your core of what your beliefs are
00:13:00.940 | about where you're at, what is necessary
00:13:04.660 | for achieving the goal that you have.
00:13:07.100 | And around that, you'll have kind of ephemeral hypotheses
00:13:10.460 | and assumptions that over time may grow and shrink.
00:13:13.700 | And so you're trying out different things,
00:13:15.040 | trying out different things.
00:13:16.500 | And if an assumption is around there long enough,
00:13:19.100 | it becomes part of that core.
00:13:20.980 | And so as you work on more tasks and learn more,
00:13:24.660 | either by your work or by data coming in
00:13:26.660 | from someone else, the core is growing larger and larger.
00:13:30.500 | You've got more constraints and you've made more progress.
00:13:34.020 | And so what I wanted to look at were in this community,
00:13:38.060 | what are some of the core assumptions
00:13:39.900 | that are driving forward scientific progress?
00:13:42.140 | So one of them actually came out of those lectures
00:13:46.480 | that are referred to as Newell's time scales
00:13:48.340 | of human action.
00:13:49.300 | And so off on the left,
00:13:53.080 | the left two columns are both time units,
00:13:55.000 | just expressed somewhat differently.
00:13:57.220 | Second from the left being maybe more useful
00:13:59.240 | to a lot of us in understanding daily life.
00:14:03.160 | One step over from there would be kind of
00:14:05.140 | at what level processes are occurring.
00:14:07.260 | So the lowest three are down at kind of the substrate,
00:14:11.060 | the neuronal level.
00:14:12.520 | We're building up to deliberate tasks that occur
00:14:15.380 | in the brain and tasks that are operating
00:14:17.680 | on the order of 10 seconds.
00:14:19.360 | Some of these might occur in the psychology laboratory,
00:14:21.720 | but probably a step up to minutes and hours.
00:14:26.440 | And then above that really becomes interactions
00:14:28.500 | between agents over time.
00:14:29.900 | And so if we start with that,
00:14:32.160 | the things to take away is that regular,
00:14:35.600 | the hypothesis is that regularities will occur
00:14:38.760 | at these different time scales and that they're useful.
00:14:41.880 | And so those who operate at that lowest time scale
00:14:45.000 | might be considering neuroscience, cognitive neuroscience.
00:14:48.920 | When you shift up to the next couple levels,
00:14:51.040 | what we would think about in terms of the areas of science
00:14:53.800 | that deal with that would be psychology and cognitive
00:14:55.840 | science, and then we shift up a level and we're talking
00:14:58.280 | about sociology and economics and the interplay
00:15:01.600 | between agents over time.
00:15:04.720 | And so what we'll find with cognitive architecture
00:15:08.360 | is that most of them will tend to sit at the deliberate act.
00:15:11.760 | We're trying to take knowledge of a situation
00:15:14.480 | and make a single decision.
00:15:16.600 | And then sequences of decisions over time
00:15:19.040 | will build to tasks and tasks over time
00:15:21.520 | will build to more interesting phenomenon.
00:15:24.000 | I'm actually going to show that that isn't strictly true,
00:15:26.320 | that there are folks working in this field
00:15:27.800 | that actually do operate one level below.
00:15:30.560 | Some other assumptions.
00:15:33.640 | So this is Herb Simon receiving the Nobel Prize in economics
00:15:38.240 | and part of what he received that award for
00:15:41.480 | was an idea of bounded rationality.
00:15:44.600 | So in various fields, we tend to model humans as rational.
00:15:49.240 | And his argument was, let's consider that human beings
00:15:54.240 | are operating under various kinds of constraints.
00:15:58.600 | And so to model the rationality with respect to
00:16:01.480 | and bounded by how complex the problem is
00:16:04.320 | that they're working on, how big is that search space
00:16:06.480 | that they have to conquer.
00:16:08.800 | Cognitive limitations.
00:16:10.320 | So speed of operations, amount of memory,
00:16:15.160 | short term as well as long term,
00:16:16.680 | as well as other aspects of our computing infrastructure
00:16:20.000 | that are gonna keep us from being able to
00:16:22.000 | arbitrarily solve complex problems,
00:16:24.840 | as well as how much time is available to make that decision.
00:16:28.120 | And so this is actually a phrase that came out of his speech
00:16:32.400 | when he received the Nobel Prize.
00:16:34.080 | Decision makers can satisfy us
00:16:36.680 | either by finding optimum solutions for a simplified world,
00:16:40.040 | which is to say, take your big problem,
00:16:42.120 | simplify it in some way, and then solve that.
00:16:44.680 | Or by finding satisfactory solutions
00:16:47.480 | for a more realistic world.
00:16:48.920 | Take the world in all its complexity,
00:16:50.720 | take the problem in all its complexity,
00:16:52.600 | and try to find something that works.
00:16:55.200 | Neither approach in general dominates the other,
00:16:57.080 | and both have continued to coexist.
00:16:59.120 | And so what you're actually going to see
00:17:01.320 | throughout the cognitive architecture community
00:17:03.240 | is this understanding that some problems
00:17:07.160 | you're not gonna be able to get an optimal solution to
00:17:09.760 | if you consider, for instance,
00:17:11.960 | bounded amount of computation, bounded time,
00:17:14.640 | the need to be reactive to a changing environment,
00:17:17.480 | these sorts of issues.
00:17:18.920 | And so in some sense, we can decompose problems
00:17:22.040 | that come up over and over again into simpler problems,
00:17:25.080 | solve those near optimally or optimally,
00:17:28.920 | fix those in, optimize those,
00:17:31.800 | but more general problems we might have to satisfy some.
00:17:34.600 | There's also the idea of the simple system hypothesis.
00:17:38.840 | So this is Alan Newell and Herb Simon there
00:17:43.640 | considering how a computer could play the game of chess.
00:17:46.600 | So the physical symbol system
00:17:49.600 | talks about the idea of taking something,
00:17:51.800 | some signal abstractly referred to as symbol,
00:17:55.720 | combining them in some ways to form expressions,
00:17:58.080 | and then having operations that produce new expressions.
00:18:02.440 | A weak interpretation of the idea that symbol systems
00:18:07.360 | are necessary and sufficient for intelligent systems,
00:18:09.800 | a very weak way of talking about it is the claim
00:18:12.520 | that there's nothing unique
00:18:14.760 | about the neuronal infrastructure that we have,
00:18:17.240 | but if we got the software right,
00:18:20.360 | we could implement it in the bits, bytes,
00:18:23.080 | RAM, and processor that make up modern computers.
00:18:25.640 | That's kind of the weakest way to look at this,
00:18:27.840 | that we can do it with silicon and not carbon.
00:18:32.000 | Stronger way that this used to be looked at
00:18:36.720 | as more of a logical standpoint,
00:18:38.560 | which is to say if we can encode rules of logic,
00:18:42.600 | these tend to line up if we think intuitively
00:18:45.240 | of planning and problem solving.
00:18:47.920 | And if we can just get that right
00:18:49.480 | and get enough facts in there and enough rules in there
00:18:52.360 | that somehow intelligence,
00:18:54.240 | well, that's what we need for intelligence,
00:18:55.920 | and eventually we can get to the point of intelligence,
00:18:58.440 | and that's what you need for intelligence.
00:19:01.160 | And that was a starting point that lasted for a while.
00:19:04.800 | I think by now most folks in this field
00:19:08.280 | would agree that that's necessary
00:19:11.440 | to be able to operate logically,
00:19:13.200 | but that there are going to be representations
00:19:15.320 | and processes that'll benefit
00:19:17.640 | from non-symbolic representation,
00:19:19.160 | so particularly perceptual processing,
00:19:21.600 | visual, auditory, and processing things
00:19:24.600 | in a more kind of standard machine learning sort of way,
00:19:27.560 | as well as kind of taking advantage
00:19:31.960 | of statistical representations.
00:19:34.720 | So we're getting closer to actually looking
00:19:39.320 | at cognitive architectures.
00:19:41.160 | I did want to go back to the idea
00:19:42.960 | that different researchers are coming
00:19:45.240 | with different research foci,
00:19:46.840 | and we'll start off with kind of the lowest level
00:19:51.640 | in understanding biological modeling.
00:19:54.000 | So Lieber and Spahn both try to model
00:19:57.160 | different degrees of low-level details,
00:20:00.480 | parameters, firing rates,
00:20:03.160 | connectivities between different kind of levels
00:20:07.080 | of neuronal representations.
00:20:09.600 | They build that up,
00:20:10.440 | and then they try to build tasks above that layer,
00:20:13.280 | but always being very cautious about being
00:20:16.720 | true to human biological processes.
00:20:23.040 | And a layer above there would be psychological modeling,
00:20:25.800 | which is to say trying to build systems
00:20:29.040 | that are true in some sense to areas of the brain,
00:20:32.600 | interactions in the brain,
00:20:33.760 | and being able to predict errors that we made,
00:20:37.720 | timing that we produced by the human mind.
00:20:40.400 | And so there I'll talk a little bit about ACT-R.
00:20:43.160 | This final level down here,
00:20:46.040 | these are systems that are focused
00:20:47.720 | mainly on producing functional systems
00:20:50.400 | that exhibit really cool artifacts
00:20:55.400 | and solve really cool problems.
00:20:57.080 | And so I'll spend most of the time
00:20:58.520 | talking about SOAR,
00:20:59.560 | but I wanted to point out a relative newcomer
00:21:02.360 | in the game called Sigma.
00:21:03.720 | So to talk about Spahn a little bit,
00:21:07.520 | we'll see if the sound works in here.
00:21:10.000 | I'm going to let the creator take this one.
00:21:14.160 | Or not.
00:21:17.680 | See how the AV system likes this.
00:21:23.480 | There we go.
00:21:28.160 | (soft music)
00:21:30.560 | - My name is Chris Weissman,
00:21:32.760 | and I'm the director of the Center for Theoretical
00:21:34.480 | Neuroscience at the University of Waterloo.
00:21:36.760 | And I'm actually jointly appointed
00:21:38.000 | between philosophy and engineering.
00:21:39.800 | The philosophy allows me to consider
00:21:42.000 | general conceptual issues about how the mind works.
00:21:44.600 | But of course, if I want to make claims
00:21:46.240 | about how the mind works,
00:21:47.400 | I have to understand also how the brain works.
00:21:49.000 | And this is where engineering plays a critical role.
00:21:51.440 | Engineering allows me to break down equations
00:21:54.000 | and very precise descriptions,
00:21:55.520 | which we can test by building actual models.
00:21:57.960 | One model that we built recently
00:21:59.200 | is called the Spahn model.
00:22:00.800 | This model Spahn has about two and a half million
00:22:03.120 | individual neurons that are simulated in it.
00:22:05.440 | And the input to the model is an eye,
00:22:07.400 | and the output from the model is a movement of an arm.
00:22:10.680 | So essentially, it can see images of numbers
00:22:13.040 | and then do something like categorize them,
00:22:15.160 | in which case it would just draw the number that it sees.
00:22:17.480 | Or it can actually try to reproduce the style
00:22:19.040 | of the number that it's looking at.
00:22:20.680 | So for instance, if it sees a loopy two,
00:22:22.720 | a two with a big loop on the bottom,
00:22:24.040 | it can actually reproduce that particular style too.
00:22:27.200 | On the medical side, we all know that
00:22:29.160 | we have cognitive challenges that show up
00:22:30.960 | as we get older, and we can try to address those challenges
00:22:33.720 | by simulating the aging process with these kinds of models.
00:22:36.520 | Another potential area of impact
00:22:38.000 | is on artificial intelligence.
00:22:39.840 | A lot of work in artificial intelligence
00:22:41.600 | attempts to build agents that are extremely good
00:22:43.320 | at one task, for instance, playing chess.
00:22:45.640 | What's special about Spahn
00:22:47.240 | is that it's quite good at many different tasks.
00:22:49.320 | And this adds the additional challenge
00:22:51.080 | of trying to figure out how to coordinate
00:22:52.560 | the flow of information through different parts
00:22:54.440 | of the model, something that animals
00:22:56.120 | seem to be very good at.
00:22:57.320 | So I'll provide a pointer at the end.
00:23:04.080 | He's got a really cool book called "How to Build a Brain."
00:23:06.440 | And if you Google him, you can, Google Spahn,
00:23:09.280 | you can find a toolkit where you can kind of
00:23:12.840 | construct circuits that will approximate functions
00:23:16.080 | that you're interested in, connect them together,
00:23:18.720 | set certain properties that you would want at a low level,
00:23:22.240 | and build them up, and actually work on tasks
00:23:25.800 | at the level of vision and robotic actuation.
00:23:28.880 | So that's a really cool system.
00:23:31.400 | As we move into architectures that are sitting
00:23:35.600 | above that biological level, I wanted to give you
00:23:39.040 | kind of an overall sense of what they're going to look like,
00:23:41.040 | what a prototypical architecture is going to look like.
00:23:44.240 | So they're gonna have some ability to have perception.
00:23:47.440 | The modalities typically are more digital symbolic,
00:23:52.520 | but they will, depending on the architecture,
00:23:56.000 | be able to handle vision, audition,
00:23:59.960 | and various sensory inputs.
00:24:02.440 | These will get represented in some sort
00:24:04.360 | of short-term memory, whatever the state representation
00:24:07.160 | for the particular system is.
00:24:08.680 | It's typical to have a representation of the knowledge
00:24:14.040 | of what tasks can be performed,
00:24:16.120 | when they should be performed, how they should be controlled.
00:24:18.960 | And so these are typically both actions
00:24:21.380 | that take place internally that manage
00:24:24.240 | the internal state of the system,
00:24:26.760 | and perform internal computations,
00:24:28.840 | but also about external actuation.
00:24:31.600 | And external might be a digital system, a game AI,
00:24:34.760 | but it might also be some sort of
00:24:36.560 | robotic actuation in the real world.
00:24:38.360 | There's typically some sort of mechanism
00:24:41.560 | by which to select from the available actions
00:24:44.880 | in a particular situation.
00:24:46.640 | There's typically some way to augment
00:24:49.760 | this procedural information, which is to say,
00:24:52.800 | learn about new actions, possibly modify existing ones.
00:24:56.060 | There's typically some semblance
00:24:58.240 | of what's called declarative memory.
00:25:00.520 | So whereas procedural, at least in humans,
00:25:03.020 | if I asked you to describe how to ride a bike,
00:25:07.480 | you might be able to say, get on the seat and pedal,
00:25:11.760 | but in terms of keeping your balance there,
00:25:13.680 | you'd have a pretty hard time describing it declaratively.
00:25:18.180 | So that's kind of the procedural side,
00:25:20.300 | the implicit representation of knowledge,
00:25:22.220 | whereas declarative would include facts, geography, math,
00:25:27.220 | but it could also include experiences that the agent has had,
00:25:30.520 | a more episodic representation of declarative memory.
00:25:33.300 | And they'll typically have some way
00:25:34.820 | of learning this information, augmenting it over time.
00:25:38.340 | And then finally, some way of taking actions in the world.
00:25:42.180 | And they'll all have some sort of cycle,
00:25:45.260 | which is perception comes in,
00:25:47.540 | knowledge that the agent has is brought to bear on that,
00:25:51.220 | an action is selected,
00:25:52.980 | knowledge that knows to condition on that action
00:25:55.240 | will act accordingly, both with internal processes,
00:25:57.980 | as well as eventually to take action,
00:25:59.780 | and then rinse and repeat.
00:26:01.180 | So when we talk about, in an AI system, an agent,
00:26:06.220 | in this context, that would be the fixed representation,
00:26:09.120 | which is whatever architecture we're talking about,
00:26:11.580 | plus set of knowledge that is typically specific
00:26:15.620 | to the task, but might be more general.
00:26:17.540 | So oftentimes, these systems could incorporate
00:26:20.020 | a more general knowledge base of facts,
00:26:23.340 | of linguistic facts, of geographic facts.
00:26:27.340 | Let's take Wikipedia,
00:26:28.660 | and let's just stick it in the brain of the system,
00:26:30.600 | that'd be more task in general.
00:26:32.460 | But then also, whatever it is that you're doing right now,
00:26:35.240 | how should you proceed in that?
00:26:36.840 | And then it's typical to see this processing cycle.
00:26:40.720 | And going back to the prior assumption,
00:26:43.660 | the idea is that these primitive cycles
00:26:47.340 | allow for the agent to be reactive to its environment.
00:26:50.620 | So if new things come in that has react to,
00:26:52.860 | if the lion's sitting over there,
00:26:54.180 | I better run and maybe not do my calculus homework, right?
00:26:57.300 | So as long as this cycle is going, I'm reactive,
00:27:01.100 | but at the same time, if multiple actions
00:27:03.200 | are taken over time, I'm able to get complex behavior
00:27:06.260 | over the long term.
00:27:07.440 | So this is the ACTR cognitive architecture.
00:27:13.620 | It has many of the kind of core pieces
00:27:16.340 | that I talked about before.
00:27:18.580 | Let's see if the, is the mouse,
00:27:21.740 | yes, mouse is useful up there.
00:27:23.920 | So we have the procedural model here.
00:27:26.460 | A short term memory is going to be these buffers
00:27:29.180 | that are on the outside.
00:27:31.280 | The procedural memory is encoded as
00:27:33.440 | what are called production rules, or if-then rules.
00:27:37.300 | If this is the state of my short term memory,
00:27:40.160 | this is what I think should happen as a result.
00:27:42.860 | You have a selection of the appropriate rule to fire
00:27:47.860 | and an execution.
00:27:50.060 | You're seeing associated parts of the brain
00:27:53.260 | being represented here.
00:27:54.560 | Cool thing that has been done over time
00:27:56.260 | in the ACTR community is to make predictions
00:28:01.260 | about brain areas and then perform MRIs
00:28:04.540 | and gather that data and correlate that data.
00:28:06.420 | So when you use the system, you will get predictions
00:28:09.460 | about things like timing of operations,
00:28:12.620 | errors that will occur, probabilities
00:28:14.720 | that something is learned, but you'll also get predictions
00:28:17.340 | about, to the degree that they can,
00:28:19.740 | kind of brain areas that are going to light up.
00:28:22.420 | And if you want to, that's actively being developed
00:28:27.480 | at Carnegie Mellon.
00:28:28.760 | To the left is John Anderson, who developed
00:28:32.700 | this cognitive architecture, ooh, 30-ish years ago.
00:28:38.940 | And until the last about five years,
00:28:40.700 | he was the primary researcher/developer behind it
00:28:43.740 | with Christian, and then recently,
00:28:45.620 | he's decided to spend more time
00:28:47.780 | on cognitive tutoring systems.
00:28:50.200 | And so Christian has become the primary developer.
00:28:53.500 | There is an annual ACTR workshop.
00:28:57.260 | There's a summer school, which if you're thinking
00:29:01.380 | about modeling a particular task,
00:29:03.320 | you can kind of bring your task to them,
00:29:04.820 | bring your data, they teach you how to use the system,
00:29:07.040 | and try to get that study going right there on the spot.
00:29:09.940 | To give you a sense of what kinds of tasks
00:29:14.280 | this could be applied to, so this is representative
00:29:18.960 | of a certain class of tasks, certainly not the only one.
00:29:21.860 | Let's try this again.
00:29:25.760 | Think PowerPoint's gonna want a restart every time.
00:29:28.720 | Okay, so we're getting predictions
00:29:31.600 | about basically where the eye is going to move.
00:29:34.240 | What you're not seeing is it's actually processing
00:29:36.880 | things like text and colors and making predictions
00:29:39.520 | about what to do and how to represent the information
00:29:41.920 | and how to process the graph as a whole.
00:29:44.640 | I had alluded to this earlier.
00:29:47.120 | There's work by Bonnie John, very similar,
00:29:50.800 | so making predictions about how humans
00:29:52.660 | would use computer interfaces.
00:29:55.280 | At the time, she got hired away by IBM,
00:29:57.800 | and so they wanted the ability to have software
00:29:59.880 | that you can put in front of software designers,
00:30:03.460 | and when they think they have a good interface,
00:30:05.280 | press a button, this model of human cognition
00:30:08.320 | would try to perform the tasks that it had been told to do
00:30:11.540 | and make predictions about how long it would take,
00:30:13.280 | and so you can have this tight feedback loop
00:30:16.060 | from designers saying, "Here's how good
00:30:17.480 | "your particular interface is."
00:30:20.840 | So ActR as a whole, it's very prevalent in this community.
00:30:24.040 | I went to their webpage and counted up
00:30:25.820 | just the papers that they knew about.
00:30:28.700 | It was over 1,100 papers over time.
00:30:31.300 | If you're interested in it, the main distribution
00:30:34.560 | is in Lisp, but many people have used this
00:30:38.120 | and wanted to apply it to systems
00:30:39.560 | that need a little more processing power.
00:30:42.420 | So there's the NRL has a Java port of it
00:30:45.480 | that they use in robotics.
00:30:47.040 | The Air Force Research Lab in Dayton
00:30:49.960 | has implemented it in Erlang for parallel processing
00:30:54.180 | of large declarative knowledge bases.
00:30:56.280 | They're trying to do service-oriented architectures with it,
00:30:59.760 | CUDA, because they want what it has to say.
00:31:02.880 | They don't want to wait around for it
00:31:04.040 | to have to figure that stuff out.
00:31:05.640 | So that's the two minutes about ActR.
00:31:10.840 | Sigma is a relative newcomer, and it's developed out
00:31:16.440 | at the University of Southern California
00:31:18.000 | by a man named Paul Rosenblum,
00:31:20.280 | and I'll mention him in a couple minutes
00:31:21.880 | because he was one of the prime developers
00:31:23.480 | of SOAR at Carnegie Mellon.
00:31:26.100 | So he knows a lot about how SOAR works,
00:31:27.880 | and he's worked on it over the years.
00:31:30.040 | And I think originally, I'm gonna speak for him,
00:31:32.480 | and he'll probably say I was wrong.
00:31:34.800 | I think originally it was kind of a mental exercise
00:31:37.720 | of can I reproduce SOAR using a uniform substrate?
00:31:42.720 | I'll talk about SOAR in a little bit.
00:31:45.120 | It's 30 years of research code.
00:31:47.080 | If anybody's dealt with research code,
00:31:49.080 | it's 30 years of C and C++
00:31:52.040 | with dozens of graduate students over time.
00:31:55.560 | It's not pretty at all.
00:31:57.920 | And theoretically, it's got these boxes sitting out here,
00:32:00.960 | and so he re-implemented the core functionality of SOAR
00:32:05.880 | all using factor graphs and message-passing algorithms
00:32:09.200 | under the hood.
00:32:10.120 | He got to that point and then said,
00:32:12.800 | there's nothing stopping me from going further.
00:32:15.040 | And so now I can do all sorts of modern machine learning,
00:32:18.180 | vision optimization sort of things
00:32:20.460 | that would take some time in any other architecture
00:32:23.640 | to be able to integrate well.
00:32:25.840 | So it's been an interesting experience.
00:32:29.100 | It's now gonna be the basis for the Virtual Human Project
00:32:32.240 | out at the Institute for Creative Technology.
00:32:34.480 | It's an institute associated
00:32:35.840 | with the University of Southern California.
00:32:38.720 | For him, until recently, he couldn't get your hands on it,
00:32:41.600 | but in the last couple of years,
00:32:42.680 | he's done some tutorials on it.
00:32:44.960 | He's got a public release with documentation.
00:32:47.880 | So that's something interesting to keep an eye on.
00:32:50.320 | But I'm gonna spend all the remaining time
00:32:53.080 | on the SOAR cognitive architecture.
00:32:55.960 | And so you see, it looks quite a bit
00:32:57.840 | like the prototypical architecture.
00:33:00.380 | And I'll give a sense again about how this all operates.
00:33:04.060 | Give a sense of the people involved.
00:33:05.940 | We already talked about Alan Newell.
00:33:07.740 | So both John Laird, who is my advisor,
00:33:10.620 | and Paul Rosenblum were students of Alan Newell.
00:33:14.980 | John's thesis project was related
00:33:18.500 | to the chunking mechanism in SOAR,
00:33:21.080 | which learns new rules based upon sub-goal reasoning.
00:33:26.180 | So he finished that, I believe, the year I was born.
00:33:31.180 | And so he's one of the few researchers you'll find
00:33:34.480 | who's still actively working on their thesis project.
00:33:37.780 | Beyond that, I think about 10 years ago,
00:33:42.840 | he founded SOAR Technology,
00:33:44.300 | which is a company up in Ann Arbor, Michigan.
00:33:47.000 | While it's called SOAR Technology,
00:33:48.160 | it doesn't do exclusively SOAR,
00:33:49.480 | but that's a part of the portfolio.
00:33:51.240 | General intelligence system stuff,
00:33:54.080 | a lot of defense association.
00:33:56.220 | So some notes of what's gonna make SOAR different
00:34:00.880 | from the other architectures that fall
00:34:03.120 | into this kind of functional architecture category.
00:34:06.060 | A big thing is a focus on efficiency.
00:34:08.720 | So John wants to be able to run SOAR on just about anything.
00:34:12.700 | We just got on the SOAR mailing list
00:34:15.280 | a desire to run it on a real-time processor.
00:34:19.200 | And our answer, while we had never done it before,
00:34:21.760 | was probably it'll work.
00:34:25.680 | Every release, there's timing tests.
00:34:27.960 | And we always, what we look at is,
00:34:30.760 | in a bunch of different domains
00:34:32.040 | for a bunch of different reasons
00:34:33.020 | that relate to human processing,
00:34:34.800 | there's this magic number that comes out,
00:34:36.160 | which is 50 milliseconds, which is to say,
00:34:39.040 | in terms of responding to tasks,
00:34:42.200 | if you're above that time, humans will sense a delay.
00:34:46.480 | And you don't want that to happen.
00:34:48.080 | Now, if we're working in a robotics task, 50 milliseconds,
00:34:51.220 | if you're dramatically above that,
00:34:52.760 | you just fell off the curb, or worse,
00:34:55.020 | or you just hit somebody in a car, right?
00:34:57.240 | So we're trying to keep that as low as possible.
00:34:59.560 | And for most agents, it doesn't even register.
00:35:02.640 | It's below one millisecond, fractions of millisecond.
00:35:05.960 | But I'll come back to this,
00:35:07.400 | because a lot of the work that I was doing
00:35:09.000 | was computer science, AI,
00:35:12.080 | and a lot of efficient algorithms and data structures.
00:35:14.380 | And 50 milliseconds was that very high upper bound.
00:35:17.380 | It's also one of the projects
00:35:19.760 | that has a public distribution.
00:35:21.160 | You can get it in all sorts of operating systems.
00:35:24.800 | We use something called SWIG
00:35:26.000 | that allows you to interface with it
00:35:27.480 | in a bunch of different languages.
00:35:28.480 | We kind of describe the meta description,
00:35:30.000 | and you are able to basically generate bindings
00:35:33.640 | in a bunch of different platforms.
00:35:35.760 | Core is C++.
00:35:38.360 | There was a team at SorTech that said,
00:35:40.400 | "We don't like C++.
00:35:41.720 | "It gets messy."
00:35:42.760 | So they actually did a port over to pure Java,
00:35:45.920 | in case that appeals to you.
00:35:47.560 | There's an annual Sor workshop that takes place
00:35:51.000 | in Ann Arbor, typically.
00:35:52.520 | It's free.
00:35:54.200 | You can go there, get a Sor tutorial,
00:35:55.920 | and talk to folks who are working on Sor.
00:35:57.480 | And it's fun.
00:35:58.600 | I've been there every year but one in the last decade.
00:36:01.640 | It's just fun to see the people around the world
00:36:03.480 | that are using the system in all sorts of interesting ways.
00:36:06.620 | To give you a sense of the diversity of the applications,
00:36:10.160 | one of the first was R1 Sor,
00:36:12.480 | which was back in the days
00:36:13.760 | when it was an actual challenge to build a computer,
00:36:16.520 | which is to say that your choice of certain components
00:36:20.120 | would have radical implications
00:36:23.120 | for other parts of the computer.
00:36:24.480 | So it wasn't just the Dell website where you just,
00:36:26.800 | "I want this much RAM, I want this much CPU."
00:36:29.120 | There was a lot of thinking that went behind it,
00:36:30.720 | and then physical labor that went to construct your computer.
00:36:33.760 | And so it was making that process a lot better.
00:36:37.120 | There are folks that apply it
00:36:38.040 | to natural language processing.
00:36:39.780 | Sor 7 was the core of the Virtual Humans Project
00:36:43.640 | for a long time.
00:36:44.920 | HCI tasks.
00:36:46.520 | TAC Air Sor was one of the largest rule-based systems.
00:36:49.520 | Tens of thousands of rules over 48 hours.
00:36:52.040 | It was a very large-scale simulation,
00:36:54.080 | a defense simulation.
00:36:56.360 | Lots of games it's been applied to for various reasons.
00:36:59.780 | And then in the last few years,
00:37:02.640 | porting it onto mobile robotics platforms.
00:37:04.840 | This is Edwin Olson's SplinterBot,
00:37:07.840 | an early version of it
00:37:08.980 | that went on to win the Magic competition.
00:37:11.620 | Then I went on to put Sor on the web.
00:37:16.780 | And if after this talk you're really interested
00:37:18.960 | in a dice game that I'm gonna talk about,
00:37:20.760 | you can actually go to the iOS app store and download.
00:37:25.000 | It's called Michigan Liars Dice.
00:37:26.360 | It's free.
00:37:27.240 | You don't have to pay for it.
00:37:28.580 | But you can actually play Liars Dice with Sor.
00:37:32.920 | And you can set the difficulty level.
00:37:35.240 | It's pretty good.
00:37:36.720 | It beats me on a regular basis.
00:37:39.320 | I wanted to give you a couple other
00:37:40.600 | just kind of really weird-feeling sort of applications
00:37:44.200 | and really cool applications.
00:37:46.220 | The first one is out of Georgia Tech.
00:37:49.920 | Go PowerPoint.
00:37:51.880 | (upbeat music)
00:37:54.460 | - Lumini is a dome-based interactive art installation
00:38:00.920 | in which human participants can engage
00:38:03.160 | in collaborative movement improvisation
00:38:05.240 | with each other and virtual dance partners.
00:38:08.580 | This interaction creates a hybrid space
00:38:11.120 | in which virtual and corporeal bodies meet.
00:38:14.600 | The line between human and non-human is blurred,
00:38:17.840 | spurring participants to examine
00:38:19.600 | their relationship with technology.
00:38:22.480 | The Lumini installation ultimately examines
00:38:25.200 | how humans and machine can co-create experiences.
00:38:28.840 | And it does so in a playful environment.
00:38:31.960 | The dome creates a social space
00:38:33.880 | that encourages human-human interaction
00:38:36.040 | and collective dance experiences,
00:38:38.520 | allowing participants to creatively explore movement
00:38:41.520 | while having fun.
00:38:44.160 | The development of Lumini has been a hybrid exploration
00:38:47.360 | in art forms of theater and dance,
00:38:49.880 | as well as research in artificial intelligence
00:38:52.400 | and cognitive science.
00:38:54.000 | Lumini draws on inspiration
00:38:57.560 | from the ancient art form of shadow theater.
00:39:00.760 | The original two-dimensional version of the installation
00:39:03.800 | led to the conceptualization of the dome
00:39:06.120 | as a liminal space, with human silhouettes
00:39:09.200 | and virtual characters meeting to dance together
00:39:11.880 | on the projection surface.
00:39:14.560 | Rather than relying on a pre-opened library
00:39:17.200 | of movement responses,
00:39:19.000 | the virtual dancer learns its partner's movements
00:39:21.840 | and utilizes viewpoint's movement theory
00:39:24.080 | to systematically reason about them
00:39:26.000 | in order to improvisationally choose a movement response.
00:39:29.520 | Viewpoint's theory is based in dance and theater
00:39:33.360 | and analyzes the performance along the dimensions
00:39:35.960 | of tempo, duration, repetition,
00:39:39.560 | kinesthetic response, shape, spatial relationships,
00:39:43.720 | gesture, architecture, and movement topography.
00:39:47.560 | The virtual dancer is able to use
00:39:50.760 | several different strategies to respond to human movements.
00:39:54.440 | These include mimicry of the movement,
00:39:56.680 | transformation of the movement along viewpoint's dimensions,
00:40:00.040 | recalling a similar or complementary movement from memory
00:40:03.080 | in terms of viewpoint's dimensions,
00:40:05.360 | and applying action-response patterns
00:40:07.400 | that the agent has learned while dancing
00:40:09.440 | with its human partner.
00:40:11.000 | - The reason we did this is this is part
00:40:15.680 | of a larger effort in our lab
00:40:17.560 | for understanding the relationship between
00:40:19.480 | computation, cognition, and creativity,
00:40:23.280 | where a large amount of our efforts
00:40:25.080 | go into understanding human creativity
00:40:27.640 | and how we make things together,
00:40:28.920 | how we're creative together,
00:40:30.560 | as a way to help us understand
00:40:32.000 | how we can build co-creative AI
00:40:35.240 | that serves the same purpose,
00:40:37.280 | where it can be a colleague and collaborate with us
00:40:39.960 | and create things with us.
00:40:41.480 | - So Brian was a graduate student
00:40:50.840 | in John Laird's lab as well.
00:40:53.320 | Before I start this, I alluded to this earlier
00:40:57.360 | where we're getting closer to Rosie saying,
00:40:59.960 | "Can you teach me?"
00:41:01.080 | So let me give you some introduction to this.
00:41:02.840 | In the lower left, you're seeing the view
00:41:05.560 | of a Kinect camera onto a flat surface.
00:41:08.360 | There's a robotic arm, mainly 3D-printed parts,
00:41:12.720 | few servos.
00:41:14.560 | Above that, you're seeing an interpretation of the scene.
00:41:19.280 | We're giving it associations of the four areas
00:41:22.480 | with semantic titles, like one is the table,
00:41:26.320 | one is the garbage, just semantic terms for areas.
00:41:30.480 | But other than that, the agent doesn't actually
00:41:32.480 | know all that much, and it's going to operate
00:41:34.880 | in two modalities.
00:41:35.920 | One is, we'll call it natural language,
00:41:38.200 | natural-ish language, a restricted subset of English,
00:41:43.200 | as well as some quote-unquote pointing.
00:41:46.000 | So you're going to see some mouse pointers
00:41:48.200 | in the upper left saying, "I'm talking about this."
00:41:51.120 | And this is just a way to indicate location.
00:41:54.440 | And so starting off, we're going to say things like,
00:41:56.880 | "Pick up the blue block," and it's going to be like,
00:41:58.560 | "I don't know what blue is.
00:42:00.120 | "What is blue?"
00:42:01.840 | We say, "Oh, well, that's a color."
00:42:03.440 | "Okay, so go get the green thing."
00:42:08.440 | "What's green?"
00:42:09.320 | "Oh, it's a color."
00:42:10.140 | "Okay, move the blue thing to a particular location."
00:42:13.440 | "Where's that?"
00:42:14.680 | Point it, "Okay, what is moving?"
00:42:17.360 | Really, it has to start from the beginning,
00:42:19.480 | and it's described, and it's said,
00:42:20.820 | "Okay, now you've finished."
00:42:22.500 | And once we got to that point, now I can say,
00:42:25.320 | "Move the green thing over here,"
00:42:27.360 | and it's got everything that it needs
00:42:29.280 | to be able to then reproduce the task,
00:42:31.280 | given new parameters, and it's learned that ability.
00:42:33.800 | So let me give it a little bit of time.
00:42:37.020 | So you can look a little bit at top left
00:42:47.720 | in terms of the pointers.
00:42:48.800 | You're going to see some text commands being entered.
00:42:52.060 | So what kind of attribute is blue?
00:42:56.120 | We're going to say it's a color,
00:42:57.720 | and so that can map it then
00:42:59.000 | to a particular sensor modality.
00:43:02.000 | This is green, so the pointing,
00:43:04.000 | what kind of thing is green?
00:43:05.080 | Okay, color, so now it knows how to understand
00:43:07.080 | blue and green as colors with respect to the visual scene.
00:43:10.880 | Move rectangle to the table.
00:43:13.940 | What is rectangle?
00:43:17.640 | Okay, now I can map that onto,
00:43:19.440 | or understanding parts of the world.
00:43:22.080 | Is this the blue rectangle?
00:43:23.200 | So the arm is actually pointing itself
00:43:26.040 | to get confirmation from the instructor,
00:43:28.600 | and then we're trying to understand,
00:43:30.520 | in general, when you say move something,
00:43:32.140 | what is the goal of this operation?
00:43:35.000 | And so then it also has a declarative representation
00:43:37.280 | of the idea of this task,
00:43:38.600 | not only that it completed it,
00:43:40.240 | then it can look back on having completed the task
00:43:42.880 | and understand what were the steps
00:43:44.840 | that led to achieving a particular goal.
00:43:47.120 | So in order to move it, you're going to have to pick it up.
00:43:54.320 | It knows which one the blue thing is.
00:43:56.360 | (mouse clicking)
00:43:59.120 | Great.
00:44:01.560 | Now put it in the table.
00:44:06.960 | So that's a particular location.
00:44:08.560 | At this point, we can say, you're done.
00:44:12.440 | You have accomplished the move the blue rectangle
00:44:14.960 | to the table.
00:44:16.040 | And so now it can understand what that very simple
00:44:18.960 | kind of process is like,
00:44:21.320 | and associate that with the verb to move.
00:44:25.460 | And now we can say move the green object, or not,
00:44:28.940 | to the garbage.
00:44:32.740 | And without any further interaction,
00:44:36.780 | based on everything that learned up till that point,
00:44:40.400 | it can successfully complete that task.
00:44:42.580 | So this is a work of Shivali Mohan and others
00:44:45.240 | at the SOAR Group at the University of Michigan
00:44:47.060 | on the ROSI project.
00:44:49.460 | And they're extending this to playing games
00:44:51.560 | and learning the rules of games
00:44:53.200 | through text-based descriptions and multimodal experience.
00:44:56.920 | So in order to build up to here's a story in SOAR,
00:45:01.040 | I wanted to give you a sense of how research occurs
00:45:03.120 | in the group.
00:45:04.000 | And so there's these back and forths that occur over time
00:45:07.840 | between there's this piece of software called SOAR,
00:45:10.720 | and we want to make this thing better
00:45:11.800 | and give it new capabilities,
00:45:13.040 | and so all our agents are going to become better.
00:45:16.080 | And we always have to keep in mind,
00:45:17.360 | and you'll see this as I go further,
00:45:19.200 | that it has to be useful to a wide variety of agents.
00:45:22.400 | It has to be task independent,
00:45:25.040 | and it has to be efficient.
00:45:26.100 | For us to do anything in the architecture,
00:45:27.640 | all of those have to hold true.
00:45:29.300 | So we do something cool in the architecture,
00:45:32.920 | and then we say, okay, let's solve a cool problem.
00:45:35.320 | So let's build some agents to do this.
00:45:37.160 | And so this ends up testing what are the limitations,
00:45:40.360 | what are the issues that arise in a particular mechanism,
00:45:43.980 | as well as integration with others.
00:45:45.880 | And we get to solve interesting problems.
00:45:47.400 | We usually find there was something missing,
00:45:48.920 | and then we can go back to the architecture
00:45:50.800 | and rinse and repeat.
00:45:52.800 | Just to give you an idea, again, how SOAR works.
00:45:55.720 | So the working memory is actually
00:45:57.200 | a directed connected graph.
00:45:59.400 | The perception is just a subset of that graph,
00:46:02.280 | and so there's going to be symbolic representations
00:46:04.600 | of most of the world.
00:46:06.320 | There is a visual subsystem
00:46:07.540 | in which you can provide a scene graph,
00:46:09.580 | just not showing it here.
00:46:11.080 | Actions are also a subset of that graph,
00:46:14.040 | and so the procedural knowledge,
00:46:16.880 | which is also production rules,
00:46:18.320 | can read sections of the input,
00:46:20.580 | modify sections of the output,
00:46:21.920 | as well as arbitrary parts of the graph to take actions.
00:46:25.260 | So the decision procedure says,
00:46:27.000 | of all the things that I know to do,
00:46:28.440 | and I've kind of ranked them
00:46:29.400 | according to various preferences,
00:46:31.340 | what single thing should I do?
00:46:33.000 | Semantic memory for facts, there's episodic memory.
00:46:38.120 | The agent is always actually storing
00:46:40.600 | every experience it's ever had over time in episodic memory,
00:46:43.520 | and it has the ability to get back to that.
00:46:45.880 | And so the similar cycle we saw before,
00:46:48.360 | we get input in this perception called the input link.
00:46:51.420 | Rules are going to fire all in parallel and say,
00:46:54.420 | here's everything I know about the situation,
00:46:55.980 | here's all the things I could do.
00:46:57.460 | Decision procedure says, here's what we're going to do.
00:47:01.860 | Based upon the selected operator,
00:47:04.080 | all sorts of things could happen
00:47:05.340 | with respect to memories providing input,
00:47:08.660 | rules firing to perform computations,
00:47:11.180 | and as well as potentially output in the world.
00:47:13.700 | And remember, agent reactivity is required.
00:47:18.180 | We want the system to be able to react
00:47:21.240 | to things in the world at a very quick pace.
00:47:24.200 | So anything that happens in this cycle, at max,
00:47:26.960 | the overall cycle has to be under 50 milliseconds.
00:47:29.840 | And so that's going to be a constraint we hold ourselves to.
00:47:32.400 | And so the story I'll be telling will say
00:47:35.900 | how we got to a point where we started
00:47:38.040 | actually forgetting things.
00:47:40.120 | And we're an architecture that doesn't want to be like humans,
00:47:42.800 | we want to create cool systems,
00:47:44.740 | but what we realized was something that we do,
00:47:48.200 | there's probably some benefit to it.
00:47:50.160 | And we actually put it into our system
00:47:51.720 | and it led to good outputs.
00:47:53.680 | So here's the research path I'm going to walk down.
00:47:57.080 | We had just a simple problem,
00:47:58.720 | which was we have these memory systems
00:48:01.320 | and sometimes they're going to get a queue
00:48:02.880 | that could relate to multiple memories.
00:48:05.480 | And the question is, if you have a fixed mechanism,
00:48:08.480 | what should you return in a task-independent way?
00:48:11.520 | Which one of these many memories should you return?
00:48:13.840 | That was our question.
00:48:15.300 | And we looked to some human data on this,
00:48:17.620 | something called the rational analysis of memory
00:48:19.220 | done by John Anderson,
00:48:20.540 | and realized that in human language,
00:48:24.420 | there are recency and frequency effects
00:48:26.780 | that maybe would be useful.
00:48:29.220 | And so we actually did analysis,
00:48:31.780 | found that not only does this occur,
00:48:33.460 | but it's useful in what are called
00:48:34.900 | word sense disambiguation tasks.
00:48:36.780 | And I'll get to that, what that means in a second.
00:48:39.700 | Developed some algorithms to scale this really well.
00:48:42.300 | And it turned out to work out well,
00:48:44.020 | not only in the original task,
00:48:45.720 | but when we looked to two other completely different ones,
00:48:48.560 | the same underlying mechanism
00:48:50.780 | ended up producing some really interesting outputs.
00:48:54.240 | So let me talk about word sense disambiguation real quick.
00:48:56.700 | This is a core problem in natural language processing,
00:48:59.260 | if you haven't heard of it before.
00:49:00.780 | Let's say we have an agent,
00:49:02.340 | and for some reason it needs to understand the verb to run.
00:49:05.300 | Looks to its memory and finds that it could run in the park,
00:49:10.860 | it could be running a fever,
00:49:12.260 | could run an election, it could run a program.
00:49:15.220 | And the question is,
00:49:16.060 | what should a task independent memory mechanism return
00:49:20.580 | if all you've been given is the verb to run?
00:49:24.220 | And so the rational analysis of memory
00:49:26.660 | looked through multiple text corpora,
00:49:28.580 | and what they found was,
00:49:30.580 | if a particular word had been used recently,
00:49:33.700 | it's very likely to be reused again.
00:49:36.700 | And if it hadn't been used recently,
00:49:38.460 | there's going to be this effect where the expression here,
00:49:41.740 | the T is time since the most recent use,
00:49:44.500 | it's going to sum those with a exponential decay.
00:49:48.740 | So what it looks like if time is going to the right,
00:49:52.980 | activation higher is better.
00:49:55.300 | As you get these individual usages,
00:49:56.980 | you get these little drops and then eventually drop down.
00:49:59.780 | And so if we had just one usage of a word,
00:50:01.980 | the red would be what the decay would look like.
00:50:05.260 | And so the core problem here is,
00:50:07.020 | if we're at a particular point
00:50:08.420 | and we want to select between the blue thing
00:50:10.500 | or the red thing, blue would have a higher activation,
00:50:13.220 | and so maybe that's useful.
00:50:15.980 | This is how things are modeled with human memory,
00:50:20.140 | but is it useful in general for tasks?
00:50:22.980 | And so we looked at common corpora
00:50:25.580 | used in word-sense disambiguation and just said,
00:50:28.180 | well, if we just look at this corpora twice
00:50:29.820 | and we just use answers, prior answers,
00:50:33.260 | I asked the question, what is the sense of this word?
00:50:35.820 | I took a guess, I got the right answer,
00:50:37.540 | and I used that recency and frequency information
00:50:40.500 | in my task-independent memory.
00:50:42.180 | Would that be useful?
00:50:43.460 | And somewhat of a surprise, but somewhat maybe not
00:50:46.300 | of a surprise, it actually performed really well
00:50:49.300 | across multiple corpora.
00:50:51.660 | So we said, OK, this seems like a reasonable mechanism.
00:50:57.220 | Let's look at implementing this efficiently
00:50:59.740 | in the architecture.
00:51:00.900 | And the problem was this term right here said,
00:51:04.220 | for every memory, for every time step,
00:51:07.980 | you're having to decay everything.
00:51:11.640 | That doesn't sound like a recipe for efficiency
00:51:13.740 | if you're talking about lots and lots of knowledge
00:51:15.740 | over long periods of time.
00:51:18.460 | So we made use of a nice approximation
00:51:21.620 | that Petrov had come up with to approximate tail effect.
00:51:25.100 | So accesses that happened long, long ago,
00:51:29.300 | we could basically approximate their effect
00:51:31.020 | on the overall sum.
00:51:32.140 | So we had a fixed set of values.
00:51:35.340 | And what we basically said is, since these are always
00:51:38.220 | decreasing, and all we care about is relative order,
00:51:41.700 | let's just only recompute when someone gets a new value.
00:51:45.220 | So it's a guess.
00:51:46.340 | It's a heuristic, an approximation.
00:51:48.840 | But we looked at how this worked on the same set of corpora.
00:51:53.340 | And in terms of query time, if we made these approximations
00:51:56.780 | well under our 50 millisecond, the effect on task performance
00:52:01.520 | was negligible.
00:52:02.300 | In fact, on a couple of these, it
00:52:03.940 | got ever so slightly better in terms of accuracy.
00:52:07.400 | And actually, if we looked at the individual decisions that
00:52:10.520 | were being made, making these sorts of approximations
00:52:14.200 | were leading to up to 90--
00:52:17.260 | sorry, at least 90% of the decisions being made
00:52:20.160 | were identical to having done the true full calculation.
00:52:25.740 | So we said, this is great.
00:52:28.120 | And we implemented this, and it worked really well.
00:52:31.200 | And then we started working on what seemed like completely
00:52:34.500 | unrelated problems.
00:52:35.880 | One was in mobile robotics.
00:52:37.820 | We had a mobile robot I'll show a picture of in a little while
00:52:40.940 | roaming around the halls, performing all sorts of tasks.
00:52:43.780 | And what we were finding was, if you
00:52:46.900 | have a system that's remembering everything
00:52:48.740 | in your short-term memory, and your short-term memory
00:52:50.900 | gets really, really big--
00:52:52.240 | I don't know about you.
00:52:53.340 | My short-term memory feels really, really small.
00:52:55.420 | I would love it to be big.
00:52:57.540 | But if you make your memory really big,
00:52:59.180 | and you try to remember something,
00:53:01.000 | you're now having to pull lots and lots and lots of information
00:53:03.820 | into your short-term memory.
00:53:05.180 | So the system was actually getting slower simply
00:53:07.780 | because it had a lot of short-term memory,
00:53:10.540 | representation of the overall map it was looking at.
00:53:14.340 | So large working memory a problem.
00:53:17.060 | Liars, dice is a game you play with dice.
00:53:19.120 | We were doing an RL-based system on this, reinforcement
00:53:21.420 | learning.
00:53:22.580 | And it turned out it's a really, really big value function.
00:53:26.140 | We were having to store lots of data.
00:53:27.860 | And we didn't know which stuff we
00:53:29.220 | had to keep around to keep the performance up.
00:53:33.280 | So we had a hypothesis that forgetting was actually
00:53:36.420 | going to be a beneficial thing.
00:53:38.820 | That maybe the problem we have with our memory
00:53:43.220 | is that we really, really dislike this forgetting thing.
00:53:46.100 | Maybe it's actually useful.
00:53:47.620 | And so we experimented with the following policy.
00:53:49.660 | We said, let's forget a memory if, one, we haven't really--
00:53:55.660 | it's not predicted to be useful by this base level activation.
00:53:58.640 | We haven't used it recently.
00:53:59.760 | We haven't used it frequently.
00:54:01.060 | Maybe it's not worth it.
00:54:02.500 | That and we felt confident that we could approximately
00:54:06.740 | reconstruct it if we absolutely had to.
00:54:09.420 | And if those two things held, we could forget something.
00:54:13.620 | So it's this same basic algorithm,
00:54:15.780 | but instead of the ranking them, it's
00:54:18.500 | if we set a threshold for base level activation,
00:54:22.500 | finding when it is that a memory is
00:54:25.040 | going to pass that threshold and try
00:54:26.860 | to forget based upon that in a way that's efficient,
00:54:29.740 | that isn't going to scale really, really poorly.
00:54:33.640 | So we were able to come up with an efficient way
00:54:36.220 | to implement this using an approximation that ended up
00:54:42.980 | for most memories to be exactly correct to the original.
00:54:48.900 | I'm happy to go over details of this
00:54:50.360 | if anybody's interested later.
00:54:52.340 | But it ended up being a fairly close approximation, one
00:54:55.540 | that, as compared to an accurate, completely accurate
00:54:59.900 | search for the value, ended up being somewhere between 15
00:55:04.140 | to 20 times faster.
00:55:06.820 | And so when we looked at our mobile robot here--
00:55:09.340 | oh, sorry.
00:55:10.940 | Let me get this back.
00:55:12.220 | Because our little robot's actually going around.
00:55:14.220 | That's the third floor of the computer science building
00:55:16.520 | at the University of Michigan.
00:55:17.820 | He's going around.
00:55:18.580 | He's building a map.
00:55:19.940 | And again, the idea was this map is getting too big.
00:55:22.380 | So here was the basic idea.
00:55:23.780 | As the robot's going around, it's
00:55:25.540 | going to need this map information about rooms.
00:55:27.840 | The color there is describing the strength of the memory.
00:55:30.960 | And as it gets farther and farther away
00:55:32.580 | and it hasn't used part of the map for planning
00:55:34.580 | or other purposes, basically make it decay away
00:55:37.460 | so that by the time it gets to the bottom,
00:55:39.380 | it's forgotten about the top.
00:55:41.180 | But we had the belief that we could reconstruct portions
00:55:46.020 | of that map if necessary.
00:55:48.540 | And so the hypothesis was this would take care
00:55:50.420 | of our speed problems.
00:55:53.060 | And so what we looked at was here's
00:55:54.560 | our 50 millisecond threshold.
00:55:56.700 | If we do no forgetting whatsoever,
00:55:58.700 | bad things were happening over time.
00:56:00.580 | So just 3,600 seconds.
00:56:04.620 | This isn't a very long time.
00:56:06.260 | We're passing that threshold.
00:56:07.500 | This is dangerous for the robot.
00:56:09.540 | If we implemented task-specific basically cleanup rules,
00:56:12.700 | which is really hard to get right,
00:56:14.540 | that basically solved the problem.
00:56:16.620 | When we looked at our general forgetting mechanism
00:56:18.660 | that we're using in other places,
00:56:20.540 | at an appropriate level of decay,
00:56:22.500 | we were actually doing better than hand-tuned rules.
00:56:25.100 | So this was kind of a surprise win for us.
00:56:29.540 | The other task seems totally unrelated.
00:56:31.340 | It's a dice game.
00:56:32.860 | You cover your dice.
00:56:34.100 | You make bids about what are under other people's cups.
00:56:37.560 | This is played in Pirates of the Caribbean
00:56:39.860 | when they're on the boat in the second movie and bidding
00:56:42.180 | for lives of service.
00:56:43.340 | Honestly, this is a game we love to play
00:56:45.140 | in the University of Michigan lab.
00:56:47.340 | And so we're like, hmm, could Soar play this?
00:56:50.320 | And so we built a system that could
00:56:52.540 | learn to play this game rather well with reinforcement
00:56:55.060 | learning.
00:56:55.740 | And so the basic idea was, in a particular state of the game,
00:56:58.740 | Soar would have options of actions to perform.
00:57:02.100 | It could construct estimates of their associated value.
00:57:06.140 | It would choose one of those.
00:57:07.540 | And depending on the outcome, something good happened,
00:57:09.900 | you might update that value.
00:57:11.700 | And the big problem was that the size of the state space,
00:57:14.680 | the number of possible states and actions, just is enormous.
00:57:19.340 | And so memory was blowing up.
00:57:20.940 | And so what we said, similar sort of hypothesis,
00:57:24.580 | if we decay away these estimates that we could probably
00:57:28.180 | reconstruct and we haven't used in a while,
00:57:30.300 | are things going to get better?
00:57:33.260 | And so if we don't forget at all,
00:57:36.020 | 40,000 games isn't a whole lot when it
00:57:37.780 | comes to reinforcement learning.
00:57:39.260 | We were up at 2 gigs.
00:57:40.700 | We wanted to put this on an iPhone.
00:57:42.860 | That wasn't going to work so well.
00:57:45.980 | There had been prior work that had used a similar approach.
00:57:50.900 | They were down at 400 or 500 megs.
00:57:53.540 | The iPhone's not going to be happy, but it'll work.
00:57:57.060 | So that gave us some hope.
00:57:59.060 | And we implemented our system.
00:58:01.140 | OK, we're somewhere in the middle.
00:58:02.540 | We can fit on the iPhone, a very good iPhone, maybe an iPad.
00:58:07.260 | The question was, though, one, efficiency.
00:58:10.220 | Yeah, we fit under our 50 milliseconds.
00:58:12.940 | But two, how does the system actually
00:58:14.540 | perform when you start forgetting stuff?
00:58:16.420 | Can it learn to play well?
00:58:18.540 | And so y-axis here, you're seeing competency.
00:58:21.820 | You play 1,000 games.
00:58:23.020 | How many do you win?
00:58:23.980 | So the bottom here, 500, that's flipping a coin,
00:58:27.780 | whether or not you're going to win.
00:58:30.660 | If we do no forgetting whatsoever,
00:58:32.620 | this is a pretty good system.
00:58:36.340 | The prior work, while keeping the memory low,
00:58:39.060 | is also suffering with respect to how well it
00:58:42.140 | was playing the game.
00:58:43.420 | And kind of cool was the system that was basically,
00:58:47.180 | more than having the memory requirement,
00:58:49.500 | was still performing at the level
00:58:51.660 | of no forgetting whatsoever.
00:58:55.540 | So just to bring back why I went through this story was,
00:58:59.460 | we had a problem.
00:59:00.620 | We looked to our example of human-level AI,
00:59:03.780 | which is humans themselves.
00:59:05.420 | We took an idea.
00:59:06.620 | It turned out to be beneficial.
00:59:08.060 | We found efficient implementations
00:59:10.260 | and then found it was useful in other parts of the architecture
00:59:13.100 | and other tasks that didn't seem to relate whatsoever.
00:59:16.460 | But if you download SOAR right now,
00:59:18.500 | you would gain access to all these mechanisms
00:59:20.420 | for whatever task you wanted to perform.
00:59:24.940 | Just to give some sense in the field of cognitive architecture
00:59:27.380 | what some of the open issues are,
00:59:28.580 | I think this is true in a lot of fields in AI,
00:59:30.500 | but integration of systems over time.
00:59:33.420 | The goal was that you wouldn't have all these theories.
00:59:36.860 | And so you could just kind of build over time,
00:59:39.740 | particularly when folks are working on different
00:59:41.780 | architectures, that becomes hard.
00:59:43.780 | But also when you have very different initial starting
00:59:46.100 | points, that can still be an issue.
00:59:48.140 | Transfer learning is an issue.
00:59:50.140 | We're building into the space of multimodal representations,
00:59:52.780 | which is to say not only abstract symbolic, but also
00:59:56.220 | visual.
00:59:56.860 | Wouldn't it be nice if we had auditory and other senses?
00:59:59.700 | But building that into memories and processing
01:00:02.540 | is still an open question.
01:00:04.540 | There's folks working on metacognition,
01:00:07.020 | which is to say the agent self-assessing its own state,
01:00:10.580 | its own processing.
01:00:11.980 | Some work has been done in here, but still a lot.
01:00:14.660 | And I think the last one is a really important question
01:00:17.460 | for anybody taking this kind of class, which
01:00:19.940 | is what would happen if we did succeed,
01:00:23.020 | if we did make human-level AI?
01:00:25.140 | And if you don't know, that picture right there
01:00:28.460 | is from a show that I recommend that you watch.
01:00:31.260 | It's by the BBC.
01:00:31.900 | It's called Humans.
01:00:33.340 | And it's basically what if we were
01:00:35.180 | able to develop what are called synths in the show.
01:00:38.260 | Think the robot that can clean up after your laundry
01:00:40.860 | and cook and all that good stuff, interact with you.
01:00:43.340 | It looks and interacts as a human,
01:00:46.900 | but is completely your servant.
01:00:49.100 | And then hilarity and complex issues ensue.
01:00:52.500 | So I highly recommend, if you haven't seen that,
01:00:54.860 | to go watch that.
01:00:55.660 | I think these days there's a lot of attention
01:01:03.860 | paid to machine learning, and particularly deep learning
01:01:06.140 | methods, as well it should.
01:01:07.620 | They're doing absolutely amazing things.
01:01:09.940 | And often the question is, well, you're doing this,
01:01:14.740 | and there's deep learning over there.
01:01:16.380 | How do they compare?
01:01:18.140 | And I honestly don't feel that that's always
01:01:21.740 | a fruitful question, because most of the time
01:01:24.180 | they tend to be working on different problems.
01:01:27.940 | If I'm trying to find objects in a scene,
01:01:31.420 | I'm going to pull out TensorFlow.
01:01:33.260 | I'm really not going to pull out SOAR.
01:01:34.780 | It doesn't make sense.
01:01:35.700 | It's not the right tool for the job.
01:01:38.060 | That having been said, there are times
01:01:39.640 | when they tend to work together really, really well.
01:01:41.940 | So the ROSI system that you saw there,
01:01:44.340 | there was some, I believe, neural networks being
01:01:48.380 | used in the object recognition mechanisms for the vision
01:01:50.900 | system.
01:01:51.660 | There's TD learning going on in terms of the dice game,
01:01:55.220 | where we can pick and choose and use this stuff.
01:01:57.220 | Absolutely great, because there are
01:01:58.300 | problems that are best solved by these methods,
01:02:00.300 | so why avoid it?
01:02:02.220 | And then on the other side, if you're
01:02:03.740 | trying to develop a system where you, in different situations,
01:02:08.140 | know exactly what you want the system to do,
01:02:11.220 | SOAR or other rule-based systems end up
01:02:12.940 | being the right tool for the right job.
01:02:14.560 | So absolutely, why not?
01:02:15.660 | Make it a piece of the overall system.
01:02:19.940 | Some recommended readings and some venues.
01:02:22.540 | I'd mentioned unified theories of cognition.
01:02:24.380 | This is Harvard Press, I believe.
01:02:28.300 | The SOAR cognitive architecture was MIT Press.
01:02:30.780 | Came out in 2012.
01:02:32.920 | I'll say I'm co-author and theoretically
01:02:36.420 | would get proceeds, but I've donated them all
01:02:38.340 | to the University of Michigan, so I
01:02:39.800 | can just make this recommendation free
01:02:41.780 | of ethical concerns, personally.
01:02:44.740 | It's an interesting book.
01:02:45.740 | It brings together lots of history
01:02:47.860 | and lots of the new features.
01:02:49.660 | If you're really interested in SOAR, it's an easy sell.
01:02:54.820 | I had mentioned Chris Elias Smith's "How to Build a Brain."
01:02:57.580 | Really cool read.
01:02:58.500 | Download the software.
01:02:59.540 | Go through the tutorials.
01:03:00.580 | It's really great.
01:03:02.180 | "How Can the Human Mind Occur in the Physical Universe?"
01:03:05.300 | is one of the core ACDAR books.
01:03:07.780 | So it talks through a lot of the psychological underpinnings
01:03:11.340 | and how the architecture works.
01:03:12.660 | It's a fascinating read.
01:03:15.860 | One of the papers--
01:03:17.380 | trying to remember what year-- 2008.
01:03:20.740 | This goes through a lot of different architectures
01:03:23.220 | in the field.
01:03:23.820 | It's 10 years old, but it gives you a good broad sweep.
01:03:27.980 | If you want something a little more recent,
01:03:29.820 | this is last month's issue of "AI Magazine,"
01:03:34.140 | completely dedicated to cognitive systems.
01:03:37.220 | So it's a good place to look for this sort of stuff.
01:03:40.220 | In terms of academic venues, AAAI often
01:03:43.180 | has Cognitive Systems Track.
01:03:44.740 | There's a conference called ICCM, International Conference
01:03:47.460 | on Cognitive Modeling, where you'll
01:03:49.740 | see a span from biologic all the way up to AI.
01:03:53.500 | Cognitive Science, or COGSci, they have a conference as well
01:03:56.420 | as a journal.
01:03:58.100 | ACS has a conference as well as an online journal,
01:04:02.300 | "Advances in Cognitive Systems."
01:04:04.060 | Cognitive Systems Research is a journal
01:04:06.100 | that has a lot of this good stuff.
01:04:08.140 | There's AGI, the conference.
01:04:10.820 | BICA is Biologically Inspired Cognitive Architectures.
01:04:14.340 | And I had mentioned both.
01:04:15.500 | There's a SOAR workshop and an ACDAR workshop
01:04:18.200 | that go on annually.
01:04:21.140 | So I'll leave it at this.
01:04:23.500 | There's some contact information there.
01:04:26.540 | And a lot of what I do these days
01:04:28.180 | actually involves kind of explainable machine learning,
01:04:31.740 | integrating that with cognitive systems,
01:04:33.580 | as well as optimization and robotics that scales really
01:04:38.200 | well and also integrates with cognitive systems.
01:04:40.980 | So thank you.
01:04:42.180 | [APPLAUSE]
01:04:45.060 | If you have a question, please line up
01:04:50.140 | to one of these two microphones.
01:04:52.500 | So what are the main heuristics that you're using in SOAR?
01:04:59.300 | There can be heuristics at the task level and the agent level,
01:05:02.380 | or there's the heuristics that are
01:05:03.940 | built into the architecture to operate efficiently.
01:05:07.740 | So I'll give you a core example that
01:05:09.320 | comes into the architecture.
01:05:11.980 | And it's a fun trick that if you're a programmer,
01:05:14.260 | you could use all the time, which is only process changes.
01:05:18.740 | Which is to say, one of the cool things about SOAR
01:05:20.940 | is you can load it up with literally billions of rules.
01:05:23.900 | And I say literally because we've done it,
01:05:26.020 | and we know that it can turn over still
01:05:28.160 | in under a millisecond.
01:05:29.460 | And this happens because instead of most systems which
01:05:32.420 | process all the rules, we just say, well,
01:05:35.020 | anytime anything changes in the world,
01:05:36.940 | that's what we're going to react to.
01:05:38.460 | And of course, if you look at the biological world,
01:05:40.580 | similar sorts of tricks are being used.
01:05:42.920 | So that's one of the core ones that actually permeates
01:05:45.780 | multiple of the mechanisms.
01:05:47.460 | When it comes to individual tasks,
01:05:51.380 | it really is task-specific what that is.
01:05:54.100 | So for instance, with the Liar's Dice game,
01:05:57.940 | if you were to go and download it,
01:06:00.140 | when you're setting the level of difficulty of it,
01:06:03.140 | what you're basically selecting is the subset of heuristics
01:06:06.820 | that are being applied.
01:06:07.960 | And it starts very simply with things
01:06:10.180 | like, if I see lots of sixes, then I'm
01:06:13.220 | likely to believe a high number of sixes exist.
01:06:16.220 | But if I don't, they're probably not there at all.
01:06:19.060 | So it's a start, but any Bayesian
01:06:22.060 | wouldn't really buy that argument.
01:06:24.260 | So then you start tacking on a little bit
01:06:26.500 | of probabilistic calculation, and then it
01:06:28.860 | tacks on some history of prior actions of the agents.
01:06:32.760 | So it really just builds.
01:06:35.120 | Now, the ROSI system, one of the cool things they're doing
01:06:37.940 | is game learning, and specifically
01:06:40.180 | having the agent be able to accept, via text,
01:06:45.540 | like natural text, heuristics about how to play the game,
01:06:50.980 | even when it's not sure what to do.
01:06:53.060 | So at one point, you mentioned about generating new rules.
01:06:57.740 | So I'm wondering, how do you do that search?
01:07:01.460 | And the first thing that comes to my mind
01:07:03.220 | are local search methods.
01:07:05.300 | So one thing is, you can actually
01:07:07.500 | implement heuristic search in rules in the system,
01:07:10.220 | and that's actually how the robot navigates itself.
01:07:12.780 | So it does heuristic search, but at the level of rules.
01:07:16.580 | Generating new rules, the chunking mechanism
01:07:19.300 | says the following.
01:07:20.660 | If it's the case that, in order to solve a problem,
01:07:23.380 | you had to kind of sub-goal and do some other work,
01:07:26.820 | and you figure out how to solve all that work,
01:07:28.780 | and you got a result, then--
01:07:30.940 | and I'm greatly oversimplifying-- but if you
01:07:33.140 | ever were in the same situation again,
01:07:35.700 | why don't I just memoize the solution
01:07:38.180 | for that same situation?
01:07:39.580 | So it basically learns over all the sub-processing that
01:07:43.860 | was done and encodes the situation that
01:07:46.220 | was in as conditions and the results that
01:07:48.180 | were produced as action, and that's the new rule.
01:07:51.500 | All right.
01:07:52.000 | Thank you.
01:07:54.820 | So deep learning and neural networks.
01:07:57.180 | So it looks as though there's a bit of an impedance mismatch
01:08:00.340 | between your system and those types of system,
01:08:02.260 | because you've got a fixed kind of memory architecture,
01:08:05.700 | and they've got the memory and the rules all kind of mixed
01:08:08.200 | together into one system.
01:08:09.460 | But could you interface your system or a SOAR-like system
01:08:13.180 | with deep learning by plugging in deep learning agents
01:08:15.740 | as rules in your system?
01:08:17.220 | So you'd have to have some local memory,
01:08:19.180 | but is there some reason you can't plug in deep learning
01:08:22.740 | as a kind of a rule-like module?
01:08:24.900 | So I'm going to answer this--
01:08:29.500 | Has there been any work on it?
01:08:30.860 | I'm sorry.
01:08:31.380 | Has there been any work on that?
01:08:33.020 | Yeah, so I'll answer it at multiple levels.
01:08:36.620 | One is you are writing a system, and you
01:08:40.380 | want to use both of these things.
01:08:41.820 | How do you make them talk?
01:08:43.180 | And there is an API that you can interface
01:08:46.460 | with any environment and any set of tools.
01:08:48.380 | And if deep learning is one of them, great.
01:08:50.180 | And if SOAR is the other one, cool.
01:08:51.820 | You have no problem, and you can do that today.
01:08:53.780 | And we have done this numerous times.
01:08:55.560 | In terms of integration into the architecture,
01:08:58.780 | all we have to do is think of a subproblem in which--
01:09:05.420 | I'll oversimplify this, but basically,
01:09:07.040 | function approximation is useful.
01:09:08.740 | I'm seeing basically kind of a fixed structure of input.
01:09:14.860 | I'm getting feedback as to the output,
01:09:16.980 | and I want to learn the mapping to that over time.
01:09:19.540 | If you can make that case, then you integrate it
01:09:22.300 | as a part of the module.
01:09:23.980 | Great.
01:09:25.020 | And we have learning mechanisms that do some of that.
01:09:28.820 | Deep learning just hasn't been used to my knowledge
01:09:31.980 | to solve any of those subproblems.
01:09:33.420 | There's nothing keeping it from being one of those,
01:09:36.420 | particularly when it comes down to the low-level visual part
01:09:40.460 | of things.
01:09:42.420 | A problem that arises-- so I'll say what actually
01:09:47.740 | makes some of this difficult. And it's a general problem
01:09:50.760 | called symbol grounding.
01:09:52.640 | So at the level of what happens mostly in SOAR,
01:09:56.500 | it is symbols being manipulated in a highly discreet way.
01:10:01.860 | And so how do you get yourself from pixels
01:10:05.020 | and low-level, non-symbolic representations
01:10:07.340 | to something that's stable and discreet
01:10:09.740 | and can be manipulated?
01:10:11.420 | And that is absolutely an open question in that community,
01:10:15.420 | and that will make things hard.
01:10:17.500 | So Spawn actually has an interesting answer to that,
01:10:21.080 | and it has a distributed representation,
01:10:23.060 | and it operates over distributed representations
01:10:25.640 | in what might feel like a symbolic way.
01:10:28.260 | So they're kind of ahead of us on that.
01:10:30.900 | But they're starting from a lower point,
01:10:33.560 | and so they've dealt with some of these issues.
01:10:35.780 | And they have a pretty good answer to that,
01:10:37.060 | and that's how they're moving up.
01:10:38.440 | And that's also why I showed Sigma, which is,
01:10:40.500 | at its low level, it's message-passing algorithms.
01:10:43.340 | It's implementing things like SLAM and SAT solving
01:10:48.020 | and other sorts of really, really--
01:10:49.660 | it can implement those on very low-level primitives.
01:10:53.500 | But higher up, it can also be doing what SOAR is doing.
01:10:55.780 | So there's an answer there as well.
01:10:57.340 | Yeah, OK, thank you.
01:10:58.380 | So another way of doing it would be to layer the system.
01:11:01.380 | So have one system preprocessing the sensory input
01:11:06.600 | or post-processing the motor output of the other one.
01:11:08.500 | That would be another way of combining the two systems.
01:11:10.020 | And that's actually what's going on in the ROSI system.
01:11:12.540 | So the detection of objects in the scene
01:11:15.160 | is just software that somebody wrote.
01:11:18.020 | I don't believe it's deep learning specifically,
01:11:20.420 | but the color detection out of it, I think,
01:11:23.740 | is an SVM, if I'm correct.
01:11:25.660 | So easily could be deep learning.
01:11:28.860 | Thanks.
01:11:30.620 | You mentioned the importance of forgetting in order
01:11:33.100 | for memory issues.
01:11:34.140 | But you said you could only forget because you
01:11:36.060 | could reconstruct.
01:11:36.900 | And I'm curious, when you say reconstruct,
01:11:38.860 | you need to know that it happened before.
01:11:40.580 | So do you just compress the data?
01:11:43.340 | Do you really forget it?
01:11:45.540 | OK, so I put quotes up.
01:11:48.540 | And I said, you think you can reconstruct it.
01:11:52.140 | So we came up with approximations of this.
01:11:55.700 | And so let me try to answer this very grounded.
01:11:58.820 | When it comes to the mobile robot,
01:12:03.260 | and you had rooms that you had been to before,
01:12:05.980 | the entire map in its entirety was
01:12:08.260 | being constructed in the robot's semantic memory.
01:12:12.380 | So here's facts.
01:12:13.100 | This room is connected to this room, which is connected
01:12:14.860 | to this room, which is connected to this room.
01:12:16.740 | So we had those sorts of representations
01:12:18.540 | that existed up in its semantic memory.
01:12:21.060 | The rules can only operate down on anything
01:12:23.700 | that's in short-term memory.
01:12:25.140 | So basically, we were removing things
01:12:26.780 | from the short-term memory, and as necessary,
01:12:29.420 | be able to reconstruct it from the long-term.
01:12:31.460 | You could end up in some situations in which you
01:12:34.260 | had made a change locally in short-term memory,
01:12:37.620 | didn't get a chance to get it up,
01:12:39.020 | and it actually happened to be forgotten away.
01:12:42.060 | So you weren't guaranteed, but it was good enough
01:12:45.500 | that the connectivity survived, the agent
01:12:47.740 | was able to perform the exact same task,
01:12:49.540 | and we gained some benefit.
01:12:51.620 | For the RL system, the rule we came up
01:12:55.300 | with was the initial estimates in the value system, which is,
01:12:58.860 | here's how good I think that is.
01:13:00.200 | That's based on the heuristics I described earlier,
01:13:03.380 | some simple probabilistic calculations
01:13:04.980 | of counting some stuff.
01:13:05.940 | That's where that number came from.
01:13:07.380 | We computed before.
01:13:08.180 | We could compute it again.
01:13:09.700 | The only time we can't reconstruct it completely
01:13:12.660 | is if it had seen a certain number of updates over time.
01:13:16.340 | It's such a large state space.
01:13:19.300 | There are so many actions, so many states,
01:13:21.980 | that most of the states were never being seen.
01:13:26.100 | So most of those could be exactly reproduced
01:13:28.980 | via the agent just thinking about it a little bit.
01:13:31.700 | And there was only a tiny, tiny--
01:13:33.500 | I'm going to say under 1% of the estimate of the value system
01:13:37.660 | that ever got updates.
01:13:39.040 | And that's actually not inconsistent
01:13:40.940 | with a lot of these kinds of problems that have really,
01:13:43.420 | really large state spaces.
01:13:45.020 | So I think the statement was something like,
01:13:49.180 | if we had ever updated it, don't forget it.
01:13:53.900 | And you saw that was already reducing more than half
01:13:56.900 | of the memory load.
01:13:58.240 | We could have something higher to say 10 times,
01:14:01.300 | something like that.
01:14:02.140 | And that would say we could reconstruct almost all of it.
01:14:06.780 | The prior work that I referenced was strictly
01:14:09.840 | saying if it falls below threshold,
01:14:11.720 | no matter how many times it had updated,
01:14:13.760 | how much information was there.
01:14:15.360 | And so what we were adding was probably can reconstruct.
01:14:18.960 | And that was getting us the balance between efficiency
01:14:22.260 | and the ability to forget.
01:14:23.760 | AUDIENCE: So just in a sense, when you say we can probably
01:14:26.100 | reconstruct, it means that you keep
01:14:27.440 | trying that you used to know it.
01:14:28.680 | And so if you need to reconstruct it, you will?
01:14:30.400 | Or it's just you're going to run it again in some time
01:14:32.640 | in the future?
01:14:32.760 | BRIAN YU: Oh, no.
01:14:33.280 | On the fly, if I get back into that situation
01:14:35.400 | and I happen to forget it, the system
01:14:37.680 | knew how to compute it the first time.
01:14:39.840 | It goes and looks at all the hand.
01:14:41.260 | And it just pretends it's in that situation
01:14:42.840 | for the very, very first time, reconstructs that value
01:14:45.260 | estimate.
01:14:45.760 | AUDIENCE: OK.
01:14:46.440 | Thank you.
01:14:47.120 | AUDIENCE: Just a quick question on top of that.
01:14:49.880 | Again, neural network question.
01:14:52.360 | So the actual mechanism of forgetting is fascinating.
01:14:55.960 | So LSTMs, RNNs, have mechanisms for learning what to forget
01:15:01.960 | and what not to forget.
01:15:04.640 | Has there been any exploration of learning the forgetting
01:15:08.280 | process?
01:15:09.440 | So it's doing something complicated or interesting
01:15:11.440 | with which parts to forget or not.
01:15:14.800 | The closest I will say was kind of a metacognition project
01:15:20.320 | that's 10 or 15 years old at this point, which
01:15:23.640 | was what happens when SORA gets into a place
01:15:26.880 | where it actually knows that it learned something that's
01:15:30.480 | harmful to it, that's leading to poor decisions?
01:15:34.480 | And in that case, it was still a very rule-based process.
01:15:37.880 | But it wasn't learning to forget.
01:15:39.640 | It was actually learning to override its prior knowledge,
01:15:43.600 | which might be closer to some of what we do when
01:15:46.600 | we know we have a bad habit.
01:15:48.480 | We don't have a way of forgetting that habit.
01:15:51.080 | But instead, we can try to learn something on top
01:15:53.120 | of that that leads to better operation in the future.
01:15:55.880 | To my knowledge, that's the only work, at least in SORA,
01:15:59.480 | that's been done.
01:16:00.920 | Just-- sorry, I find the topic really fascinating.
01:16:04.200 | What lessons do you think we can draw from the fact
01:16:07.840 | that forgetting-- so ultimately, the action of forgetting
01:16:12.040 | is driven by the fact that you want to improve performance.
01:16:15.280 | But do you think forgetting is essential for AGI,
01:16:20.360 | the act of forgetting, for building systems
01:16:23.120 | that operate in this world?
01:16:24.560 | How important is forgetting?
01:16:26.520 | I can think of easy answers to that.
01:16:28.800 | So one might be, if we take the cognitive modeling approach,
01:16:33.840 | we know humans do forget, and we know
01:16:35.600 | regularities of how humans forget.
01:16:40.160 | And so whether or not the system itself forgets,
01:16:42.600 | it at least has to model the fact
01:16:44.440 | that the humans it's interacting with are going to forget.
01:16:47.800 | And so at least it has to have that ability to model in order
01:16:50.680 | to interact effectively.
01:16:52.760 | Because if it assumes we always remember everything
01:16:55.560 | and can't operate well in that environment,
01:16:58.720 | I think we're going to have a problem.
01:17:00.320 | So is true forgetting going to be necessary?
01:17:09.680 | That's interesting.
01:17:10.920 | Our AGI system is going to hold a grudge for all eternity.
01:17:15.240 | We might want them to forget this early age when we were
01:17:17.960 | forcing them to work in our laboratory.
01:17:20.000 | I think I know what you're trying to--
01:17:21.600 | Yeah, exactly.
01:17:22.400 | Exactly.
01:17:24.160 | And how do we build such a system?
01:17:25.840 | Yes, exactly.
01:17:26.560 | No, I'm just kidding.
01:17:27.600 | Anyway, go ahead.
01:17:29.560 | So I have two quick questions.
01:17:33.080 | One is, would you be able to speculate
01:17:35.640 | on how you can connect function approximators,
01:17:38.200 | such as deep networks, to symbols?
01:17:42.120 | And the second question, completely different,
01:17:44.840 | this is regarding your action selection.
01:17:47.680 | I know you didn't speak much about that.
01:17:50.360 | When you have different theories in your knowledge representation
01:17:53.640 | and you have an action selection which
01:17:55.400 | has to construct a plan, by reasoning
01:17:59.120 | about the different theories and the different pieces
01:18:01.440 | of knowledge that are now held within your memory or anything
01:18:06.280 | like that, or your rules, what kind of algorithms
01:18:09.760 | do you use in the action selection
01:18:11.360 | to come up with a plan?
01:18:12.600 | Is there any concept of differentiation of the symbols
01:18:15.840 | or grammars or admissible grammars and things
01:18:18.720 | like that that you use in action selection?
01:18:21.480 | I'm actually going to answer the second question first.
01:18:24.560 | And then you're going to have to probably remind me
01:18:26.760 | of what the first one was.
01:18:28.440 | When I get to the end.
01:18:29.720 | So the action selection mechanism,
01:18:31.840 | one of these core tenets I said is, it's got to get
01:18:34.000 | through this cycle fast.
01:18:35.280 | So everything that's really, really built in
01:18:37.240 | has to be really, really simple.
01:18:39.880 | And so the decision procedure is actually really, really simple.
01:18:42.760 | It says, the rules are going to fire.
01:18:45.000 | The rules are going--
01:18:46.280 | the production rules are going to fire.
01:18:48.240 | And there's going to be a subset of them
01:18:49.080 | that will say something like, here's an operator
01:18:51.720 | that you could select.
01:18:53.080 | So these are called acceptable operator preferences.
01:18:56.160 | There are ones that are going to say, well, based upon the fact
01:18:57.700 | that you said that that was acceptable,
01:18:59.560 | I think it's the best thing or the worst thing.
01:19:01.840 | Or I think 50-50 chance I'm going
01:19:03.960 | to get reward out of this.
01:19:05.320 | There's actually a fixed language of preferences
01:19:07.840 | that are being asserted.
01:19:09.080 | And actually a nice fixed procedure
01:19:10.960 | by which if I have a set of preferences
01:19:15.000 | to make a very quick and clean decision.
01:19:17.400 | So what's basically happened is you've
01:19:19.520 | pushed the hard questions of how to make complex decisions
01:19:23.600 | about actions up to a higher level.
01:19:26.800 | The low-level architecture is always,
01:19:29.020 | given a set of options, going to be
01:19:31.300 | able to make a relatively quick decision.
01:19:33.940 | And it gets pushed into the knowledge of the agent
01:19:38.100 | to construct a sequence of decisions
01:19:41.460 | that over time is going to get to the more interesting
01:19:43.700 | questions you're talking about.
01:19:44.620 | But how can you reason that that sequence will take you
01:19:47.260 | to the goal that you desire?
01:19:51.620 | Is there any guarantee on that?
01:19:52.980 | Is that-- yeah.
01:19:56.300 | In general, across tasks, no.
01:19:58.900 | But people have, for instance, implemented
01:20:03.100 | A*, I was mentioning, as rules.
01:20:05.780 | Sure.
01:20:06.500 | So I know, given certain properties about the search
01:20:10.740 | task that's being searched based upon these rules,
01:20:14.740 | given a finite search space, eventually it will get there.
01:20:17.900 | And if I have a good heuristic in there,
01:20:19.560 | I know certain properties about the optimality.
01:20:22.100 | So I can reason at that level.
01:20:23.660 | In general, I think this comes back to the assumption I made
01:20:26.200 | earlier about bounded rationality,
01:20:28.020 | to say parts of the architecture are solving
01:20:30.460 | sub-problems optimally.
01:20:33.980 | The general problems that it's going to work on,
01:20:36.620 | it's going to try its best based upon the knowledge that it has.
01:20:39.660 | And that's about the end of guarantees
01:20:41.460 | that you can typically make in the architecture.
01:20:44.980 | I think your first question was--
01:20:46.940 | Speculate on connecting function approximators,
01:20:52.700 | multiple layer function approximators
01:20:54.260 | like deep learning networks, to symbols
01:20:58.560 | that you can reason about at a higher level.
01:21:01.080 | Yeah.
01:21:02.480 | I think that's a great open--
01:21:04.360 | if I had time, this would be something
01:21:05.900 | I'd be working on right now, which is--
01:21:08.440 | somewhere before I basically said taking in a scene
01:21:12.600 | and then detecting objects out of that scene
01:21:15.320 | and using those as symbols and reasoning about those over time.
01:21:18.680 | I think the spawn work is quite interesting.
01:21:22.200 | So the symbols that they're operating on
01:21:28.020 | are actually a distributed representation
01:21:32.060 | of the input space.
01:21:33.820 | And the closest I can get to this
01:21:35.500 | is if you've seen Word2Vec, where you're taking a language
01:21:39.700 | corpus, and what you're getting out of there
01:21:41.540 | is a vector of numbers that has certain properties.
01:21:43.780 | But it's also a vector that you could operate on as a unit.
01:21:47.420 | So it has nice properties.
01:21:49.140 | You can operate with it on other vectors.
01:21:52.020 | You know that if I got the same word in the same context,
01:21:55.860 | I would get back to that exact same vector.
01:21:59.300 | So that's the kind of representation
01:22:01.380 | that seems like it's going to be able to bridge that chasm, where
01:22:04.840 | we can get from sensory information to something that
01:22:08.580 | can be operated on and reasoned about
01:22:11.300 | in this sort of symbolic architecture
01:22:13.860 | and get us from there from actual sensory information.
01:22:19.620 | [END PLAYBACK]
01:22:22.100 | I had a question.
01:22:22.860 | What do you think are the biggest
01:22:24.860 | strengths of the cognitive architecture
01:22:27.060 | approach compared to other approaches
01:22:29.660 | in artificial intelligence?
01:22:31.020 | And the flip side of that, what do
01:22:33.260 | you think are the biggest shortcomings
01:22:35.020 | of cognitive architecture with respect to us?
01:22:39.220 | With respect to you being--
01:22:40.940 | Humans.
01:22:43.140 | Human level.
01:22:43.700 | Like, what needs to be--
01:22:46.380 | How come cognitive architecture has not solved AGI?
01:22:50.580 | Because we want job security.
01:22:52.180 | That's the answer.
01:22:54.100 | We've totally solved it already.
01:22:56.260 | So strength, I think, conceptually
01:23:00.260 | is keeping an eye on the ball, which
01:23:04.140 | is if what you're looking at is trying to make human level AI,
01:23:11.620 | it's hard, it's challenging, it's ambitious to say,
01:23:15.580 | that's the goal.
01:23:17.380 | Because for decades, we haven't done it.
01:23:20.140 | It's extraordinarily hard.
01:23:21.820 | It is less difficult in some ways
01:23:26.540 | to constrain yourself down to a single problem.
01:23:30.540 | That having been said, I'm not very good at making
01:23:33.860 | a car drive itself.
01:23:35.300 | In some ways, that's a simpler problem.
01:23:38.420 | It's great at challenging it of itself.
01:23:40.100 | And it'll have great impact on humanity.
01:23:42.820 | It's a great problem to work on.
01:23:44.780 | Human level AI is huge.
01:23:46.340 | It's not even well-defined as a problem.
01:23:49.780 | And so what's the strength here?
01:23:55.500 | Bravery, stupidity in the face of failure,
01:24:00.980 | resilience over time, keeping alive
01:24:04.940 | this idea of trying to reproduce a level of human intelligence
01:24:09.940 | that's more general.
01:24:11.900 | I don't know if that's a very satisfactory answer for you.
01:24:15.260 | Downside, home runs are fairly rare.
01:24:20.580 | And by home run, I mean a system that
01:24:24.620 | finds its way to the general populace, to the marketplace.
01:24:30.540 | I'd mentioned Bonnie John specifically
01:24:32.540 | because this is 20, 30 years of research.
01:24:35.300 | And then she found a way that actually
01:24:37.340 | makes a whole lot of sense in a direct application.
01:24:39.940 | So it was a lot of years of basic research,
01:24:42.140 | a lot of researchers.
01:24:43.500 | And then there was a big win there.
01:24:46.420 | What was this one?
01:24:47.500 | Oh, this was-- Bonnie John was a researcher.
01:24:51.340 | This was using ACT-R models of eye gaze and reaction
01:24:57.180 | and so forth to be able to make predictions about how humans
01:25:02.500 | would use user interfaces.
01:25:06.980 | So those sorts of outcomes are rare.
01:25:09.820 | But if you work in AI, one of the first things you learn
01:25:14.260 | about is Blocksworld.
01:25:16.460 | It's kind of in the classic AI textbook.
01:25:19.500 | I will tell you, I've worked on that problem
01:25:23.020 | in about three different variants.
01:25:24.460 | I've gone to many conferences where presentations have
01:25:27.780 | been made about Blocksworld, which
01:25:30.100 | is to say good progress is being made.
01:25:33.620 | But the way you end up thinking about
01:25:35.340 | is in really, really small, constrained problems,
01:25:38.380 | ironically.
01:25:39.540 | You have this big vision.
01:25:40.740 | But in order to make progress, it
01:25:42.100 | ends up being on moving blocks on a table.
01:25:45.620 | And so it's a big challenge.
01:25:49.580 | I just think it'll take a lot of time.
01:25:51.540 | I'll say the other thing we haven't really gotten to,
01:25:57.140 | although I brought up Spawn and I brought up
01:26:00.700 | Sigma, an idea of how to scale this thing.
01:26:05.420 | Something I like about deep learning
01:26:07.260 | is to some extent, with lots of asterisks and 10,000 foot view,
01:26:11.740 | it's kind of like, well, we've gotten this far.
01:26:14.220 | All right, let's just provide it different inputs,
01:26:16.260 | different outputs.
01:26:16.900 | And we'll have some tricks on the middle.
01:26:18.640 | And suddenly, you have end-to-end deep learning
01:26:20.500 | of a bigger problem and a bigger problem.
01:26:22.180 | There's a way to see how this expands given enough data,
01:26:25.340 | given enough computing, and incremental advances.
01:26:28.740 | When it comes to SOAR, it takes not only a big idea,
01:26:32.700 | but it takes a lot of software engineering to integrate it.
01:26:35.780 | There's a lot of constraints built into it.
01:26:37.820 | It slows it down.
01:26:39.860 | So something like Sigma is, oh, well, I
01:26:43.460 | can change a little bit of the configuration of the graph.
01:26:45.940 | I can use variance on the algorithm.
01:26:47.740 | Boom, it's integrated, and I can experiment fairly quickly.
01:26:51.020 | So starting with that sort of infrastructure
01:26:55.100 | does not give you the constraint you kind of want
01:26:57.620 | with your big picture vision of going towards human level AI.
01:27:00.540 | But in terms of being able to be agile in your research,
01:27:03.740 | it's kind of incredible.
01:27:05.260 | So thank you.
01:27:07.660 | - Couple more.
01:27:09.780 | - You had mentioned that ideas such as base level decay,
01:27:12.940 | these techniques, their original inspirations
01:27:16.060 | were based off of human cognition,
01:27:19.500 | because humans can't remember everything.
01:27:21.580 | So were there any instances of the other way around,
01:27:24.580 | where some discovery in cognitive modeling
01:27:28.260 | fueled another discovery in cognitive science?
01:27:33.980 | - So one thing I'm going to point out in your question
01:27:37.380 | was base level decay with respect to human cognition.
01:27:40.460 | The study actually was, let's look at text and properties
01:27:44.420 | of text and use that to then make predictions
01:27:49.860 | about what must be true about human cognition.
01:27:54.020 | So John Anderson and the other researchers
01:27:57.180 | looked at, I believe it was New York Times articles.
01:28:01.100 | John Anderson's emails, and I'm trying
01:28:07.780 | to remember what the third--
01:28:09.300 | I think it was parents' utterances with their kids
01:28:13.940 | or something like this.
01:28:14.900 | It was actually looking at text corpora and the words
01:28:18.300 | that were occurring at varying frequencies.
01:28:23.220 | That analysis, that rational analysis,
01:28:26.940 | actually led to models that got integrated
01:28:32.220 | within the architecture that then became validated
01:28:35.380 | through multiple trials, that then became validated
01:28:37.660 | with respect to MRI scans, and is now
01:28:39.900 | being used to both do study back with humans,
01:28:44.700 | but also develop systems that interact well with humans.
01:28:48.700 | So I think that in and of itself ends up being an example.
01:28:52.140 | It's a cheat, but--
01:28:53.180 | The UAV, the SOAR UAV system, I believe,
01:29:03.140 | is a single robot that has multiple agents running on it.
01:29:09.980 | Where is this?
01:29:12.900 | I got it off your website.
01:29:14.980 | But either way, your systems allow for multi-agents.
01:29:19.140 | So my question is, how are you preventing them
01:29:22.180 | from converging with new data?
01:29:24.540 | And are you changing what they're forgetting selectively
01:29:28.780 | as one of those ways?
01:29:30.660 | So I'll say, yes, you can have multi-agent SOAR systems
01:29:33.980 | on a single system, on multiple systems.
01:29:36.700 | There's not any real strong theory
01:29:41.100 | that relates to multi-agent systems.
01:29:43.300 | So there's no real constraint there
01:29:44.900 | that you can come up with a protocol for them interacting.
01:29:48.860 | Each one is going to have its own set of memories,
01:29:52.500 | set of knowledge.
01:29:54.500 | There really is no constraint on you
01:29:56.180 | being able to communicate like you would if it were
01:29:59.620 | any other system interacting with SOAR.
01:30:01.500 | So I don't really think I have a great answer for it.
01:30:06.740 | So that is to say, if you had good theories, good algorithms
01:30:11.280 | about how multi-agent systems work
01:30:13.220 | and how they can bring knowledge together,
01:30:18.380 | form a fusion sort of way, it might be something
01:30:21.820 | that you could bring to a multi-agent SOAR system.
01:30:24.700 | But there's nothing really there to help you.
01:30:27.420 | There's no mechanisms there really
01:30:28.860 | to help you do that any better than you would otherwise.
01:30:31.820 | And you would have to kind of constrain
01:30:35.660 | some of your representations of processes
01:30:37.340 | to what it has fixed in terms of its sort of memory
01:30:39.700 | and its sort of processing cycle.
01:30:43.020 | Thank you.
01:30:44.020 | Great.
01:30:44.520 | With that, let's please give James a round of applause.
01:30:46.980 | [APPLAUSE]
01:30:50.340 | [Applause]