back to index

David Ferrucci: AI Understanding the World Through Shared Knowledge Frameworks | AI Podcast Clips

Whisper Transcript | Transcript Only Page

00:00:00.000 | - Do you think that shared knowledge,
00:00:04.380 | if we can maybe escape the hardware question,
00:00:09.240 | how much is encoded in the hardware,
00:00:11.100 | just the shared knowledge in the software,
00:00:12.880 | the history, the many centuries of wars and so on
00:00:16.960 | that came to today, that shared knowledge,
00:00:19.640 | how hard is it to encode?
00:00:25.020 | Do you have a hope?
00:00:27.480 | Can you speak to how hard is it to encode that knowledge
00:00:30.960 | systematically in a way that could be used by a computer?
00:00:34.420 | - So I think it is possible to learn for a machine,
00:00:37.960 | to program a machine to acquire that knowledge
00:00:41.240 | with a similar foundation.
00:00:43.080 | In other words, a similar interpretive foundation
00:00:47.760 | for processing that knowledge.
00:00:49.680 | - What do you mean by that?
00:00:50.720 | - So in other words, we view the world in a particular way.
00:00:56.040 | So in other words, we have, if you will,
00:01:00.480 | as humans, we have a framework
00:01:01.760 | for interpreting the world around us.
00:01:03.920 | So we have multiple frameworks
00:01:06.440 | for interpreting the world around us.
00:01:07.640 | But if you're interpreting, for example,
00:01:11.440 | social political interactions,
00:01:13.020 | you're thinking about where there's people,
00:01:14.800 | there's collections and groups of people,
00:01:17.200 | they have goals, goals are largely built around survival
00:01:20.120 | and quality of life,
00:01:23.000 | their fundamental economics around scarcity of resources.
00:01:28.000 | And when humans come and start interpreting
00:01:31.320 | a situation like that,
00:01:32.360 | because you brought up like historical events,
00:01:35.280 | they start interpreting situations like that.
00:01:37.160 | They apply a lot of this fundamental framework
00:01:41.160 | for interpreting that.
00:01:42.400 | Well, who are the people?
00:01:43.920 | What were their goals?
00:01:45.000 | What resources did they have?
00:01:46.680 | How much power or influence did they have over the other?
00:01:48.720 | Like this fundamental substrate, if you will,
00:01:52.200 | for interpreting and reasoning about that.
00:01:54.440 | So I think it is possible to imbue a computer
00:01:58.600 | with that stuff that humans like take for granted
00:02:02.320 | when they go and sit down and try to interpret things.
00:02:06.000 | And then with that foundation, they acquire,
00:02:10.520 | they start acquiring the details,
00:02:12.000 | the specifics in a given situation,
00:02:14.480 | are then able to interpret it with regard to that framework.
00:02:17.360 | And then given that interpretation,
00:02:19.120 | they can do what?
00:02:20.360 | They can predict.
00:02:22.000 | But not only can they predict,
00:02:23.880 | they can predict now with an explanation
00:02:26.480 | that can be given in those terms,
00:02:29.600 | in the terms of that underlying framework
00:02:31.880 | that most humans share.
00:02:34.000 | Now you can find humans that come and interpret events
00:02:36.280 | very differently than other humans,
00:02:37.960 | because they're like using a different framework.
00:02:41.840 | You know, the movie "Matrix" comes to mind,
00:02:44.160 | where they decided humans were really just batteries.
00:02:48.080 | And that's how they interpreted the value of humans
00:02:51.600 | as a source of electrical energy.
00:02:53.300 | So, but I think that, you know, for the most part,
00:02:57.120 | we have a way of interpreting the events,
00:03:02.120 | or the social events around us,
00:03:03.920 | because we have this shared framework.
00:03:05.800 | It comes from, again, the fact that we're similar beings
00:03:10.360 | that have similar goals, similar emotions,
00:03:12.760 | and we can make sense out of these.
00:03:14.580 | These frameworks make sense to us.
00:03:16.680 | - So how much knowledge is there, do you think?
00:03:19.720 | So you said it's possible.
00:03:21.280 | - There's a tremendous amount of detailed knowledge
00:03:23.760 | in the world.
00:03:24.600 | There are, you know, you can imagine, you know,
00:03:27.880 | effectively infinite number of unique situations
00:03:31.000 | and unique configurations of these things.
00:03:33.760 | But the knowledge that you need,
00:03:36.780 | what I refer to as like the frameworks,
00:03:39.240 | for you need for interpreting them, I don't think.
00:03:41.240 | I think those are finite.
00:03:43.160 | - You think the frameworks are more important
00:03:46.680 | than the bulk of the knowledge?
00:03:48.440 | So like framing--
00:03:49.400 | - Yeah, because what the frameworks do
00:03:50.880 | is they give you now the ability to interpret and reason,
00:03:53.240 | and to interpret and reason it,
00:03:54.760 | to interpret and reason over the specifics
00:03:58.440 | in ways that other humans would understand.
00:04:00.880 | - What about the specifics?
00:04:02.920 | - You acquire the specifics by reading
00:04:05.640 | and by talking to other people.
00:04:07.200 | - So I'm mostly, actually, just even,
00:04:09.360 | if we can focus on even the beginning,
00:04:11.920 | the common sense stuff,
00:04:13.160 | the stuff that doesn't even require reading,
00:04:15.080 | or it almost requires playing around with the world
00:04:18.520 | or something.
00:04:19.360 | Just being able to sort of manipulate objects,
00:04:22.480 | drink water and so on, all of that.
00:04:25.560 | Every time we try to do that kind of thing
00:04:27.800 | in robotics or AI, it seems to be like an onion.
00:04:32.720 | You seem to realize how much knowledge
00:04:34.920 | is really required to perform
00:04:36.280 | even some of these basic tasks.
00:04:38.720 | Do you have that sense as well?
00:04:41.440 | So how do we get all those details?
00:04:45.480 | Are they written down somewhere?
00:04:47.360 | Do they have to be learned through experience?
00:04:50.920 | - So I think when you're talking about
00:04:53.040 | sort of the physics, the basic physics around us,
00:04:56.400 | for example, acquiring information about,
00:04:58.240 | acquiring how that works.
00:04:59.840 | Yeah, I think there's a combination of things going on.
00:05:06.320 | I think there is fundamental pattern matching,
00:05:09.480 | like what we were talking about before,
00:05:11.360 | where you see enough examples,
00:05:12.760 | enough data about something,
00:05:13.840 | you start assuming that.
00:05:15.560 | And with similar input,
00:05:17.160 | I'm gonna predict similar outputs.
00:05:19.400 | You can't necessarily explain it at all.
00:05:21.760 | You may learn very quickly that when you let something go,
00:05:25.280 | it falls to the ground.
00:05:27.560 | - That's such a--
00:05:29.520 | - But you can't necessarily explain that.
00:05:31.440 | - But that's such a deep idea,
00:05:34.000 | that if you let something go,
00:05:35.560 | like the idea of gravity.
00:05:36.860 | - I mean, people are letting things go
00:05:39.560 | and counting on them falling well
00:05:40.920 | before they understood gravity.
00:05:42.400 | - But that seems to be, that's exactly what I mean.
00:05:45.520 | Is before you take a physics class
00:05:47.760 | or study anything about Newton,
00:05:51.240 | just the idea that stuff falls to the ground
00:05:54.200 | and then be able to generalize
00:05:57.000 | that all kinds of stuff falls to the ground.
00:06:00.200 | It just seems like a non,
00:06:03.880 | without encoding it, like hard coding it in,
00:06:06.880 | it seems like a difficult thing to pick up.
00:06:09.080 | It seems like you have to have a lot of different knowledge
00:06:13.040 | to be able to integrate that into the framework,
00:06:17.040 | sort of into everything else.
00:06:19.400 | So both know that stuff falls to the ground
00:06:22.040 | and start to reason about sociopolitical discourse.
00:06:27.040 | So both, like the very basic
00:06:30.280 | and the high level reasoning decision making.
00:06:34.240 | I guess my question is, how hard is this problem?
00:06:36.680 | Sorry to linger on it because again,
00:06:40.720 | and we'll get to it for sure,
00:06:42.800 | as what Watson with Jeopardy did
00:06:44.680 | is take on a problem that's much more constrained
00:06:47.160 | but has the same hugeness of scale,
00:06:49.920 | at least from the outsider's perspective.
00:06:52.320 | So I'm asking the general life question
00:06:54.560 | of to be able to be an intelligent being
00:06:57.280 | and reasoning in the world
00:06:59.200 | about both gravity and politics,
00:07:02.560 | how hard is that problem?
00:07:03.800 | - So I think it's solvable.
00:07:07.840 | - Okay, now beautiful.
00:07:12.360 | So what about time travel?
00:07:16.000 | Okay, on that topic.
00:07:18.760 | - I'm not as convinced.
00:07:21.360 | - Not as convinced yet, okay.
00:07:22.760 | - No, I think it is solvable.
00:07:25.920 | I mean, I think that it's a,
00:07:28.160 | first of all, it's about getting machines to learn.
00:07:30.120 | Learning is fundamental.
00:07:33.040 | And I think we're already in a place that we understand,
00:07:36.080 | for example, how machines can learn in various ways.
00:07:40.280 | Right now, our learning stuff is sort of primitive
00:07:44.120 | in that we haven't sort of taught machines
00:07:49.120 | to learn the frameworks.
00:07:50.920 | We don't communicate our frameworks
00:07:52.840 | because of our shared, in some cases we do,
00:07:54.520 | but we don't annotate, if you will,
00:07:58.040 | all the data in the world with the frameworks
00:08:00.640 | that are inherent or underlying our understanding.
00:08:04.800 | Instead, we just operate with the data.
00:08:07.840 | So if we wanna be able to reason over the data
00:08:10.760 | in similar terms in the common frameworks,
00:08:14.000 | we need to be able to teach the computer,
00:08:15.440 | or at least we need to program the computer to acquire,
00:08:19.280 | to have access to and acquire, learn the frameworks as well
00:08:24.280 | and connect the frameworks to the data.
00:08:27.400 | I think this can be done.
00:08:30.100 | I think we can start, I think machine learning,
00:08:34.640 | for example, with enough examples
00:08:37.760 | can start to learn these basic dynamics.
00:08:40.600 | Will they relate them necessarily to gravity?
00:08:43.920 | Not unless they can also acquire those theories as well
00:08:48.800 | and put the experiential knowledge
00:08:52.600 | and connect it back to the theoretical knowledge.
00:08:55.080 | I think if we think in terms of these class of architectures
00:08:58.880 | that are designed to both learn the specifics,
00:09:02.720 | find the patterns, but also acquire the frameworks
00:09:05.880 | and connect the data to the frameworks,
00:09:08.000 | if we think in terms of robust architectures like this,
00:09:11.400 | I think there is a path toward getting there.
00:09:15.080 | - In terms of encoding architectures like that,
00:09:17.880 | do you think systems that are able to do this
00:09:20.880 | will look like neural networks or representing,
00:09:26.620 | if you look back to the '80s and '90s,
00:09:28.640 | with the expert systems, so more like graphs,
00:09:33.380 | systems that are based in logic,
00:09:35.260 | able to contain a large amount of knowledge
00:09:38.220 | where the challenge was the automated acquisition
00:09:40.180 | of that knowledge.
00:09:41.540 | I guess the question is,
00:09:43.540 | when you collect both the frameworks
00:09:45.500 | and the knowledge from the data,
00:09:46.980 | what do you think that thing will look like?
00:09:48.980 | - Yeah, so I mean, I think asking the question
00:09:51.060 | do they look like neural networks is a bit of a red herring.
00:09:52.960 | I mean, I think that they will certainly do inductive
00:09:56.860 | or pattern-matched based reasoning.
00:09:58.420 | And I've already experimented with architectures
00:10:00.660 | that combine both, that use machine learning
00:10:04.380 | and neural networks to learn certain classes of knowledge,
00:10:07.020 | in other words, to find repeated patterns
00:10:08.980 | in order for it to make good inductive guesses,
00:10:13.220 | but then ultimately to try to take those learnings
00:10:16.940 | and marry them, in other words, connect them to frameworks
00:10:21.220 | so that it can then reason over that
00:10:23.180 | in terms other humans understand.
00:10:25.340 | So for example, at Elemental Cognition, we do both.
00:10:27.780 | We have architectures that do both.
00:10:30.380 | But both those things, but also have a learning method
00:10:33.340 | for acquiring the frameworks themselves and saying,
00:10:36.020 | "Look, ultimately I need to take this data.
00:10:38.940 | I need to interpret it in the form of these frameworks
00:10:41.700 | so they can reason over it."
00:10:42.540 | So there is a fundamental knowledge representation,
00:10:45.020 | like what you're saying,
00:10:45.900 | like these graphs of logic, if you will.
00:10:48.460 | There are also neural networks
00:10:50.960 | that acquire a certain class of information.
00:10:53.280 | Then they align them with these frameworks.
00:10:57.580 | But there's also a mechanism
00:10:58.820 | to acquire the frameworks themselves.
00:11:00.860 | - Yeah, so it seems like the idea of frameworks
00:11:04.220 | requires some kind of collaboration with humans.
00:11:07.060 | - Absolutely.
00:11:07.980 | - So do you think of that collaboration as--
00:11:10.980 | - Well, and let's be clear.
00:11:13.580 | Only for the express purpose that you're designing
00:11:18.580 | an intelligence that can ultimately communicate with humans
00:11:24.220 | in the terms of frameworks that help them understand things.
00:11:28.740 | So to be really clear,
00:11:31.100 | you can independently create a machine learning system,
00:11:36.100 | an intelligence that I might call an alien intelligence
00:11:40.180 | that does a better job than you with some things,
00:11:42.860 | but can't explain the framework to you.
00:11:45.220 | That doesn't mean it might be better than you at the thing.
00:11:48.420 | It might be that you cannot comprehend the framework
00:11:51.220 | that it may have created for itself
00:11:53.060 | that is inexplicable to you.
00:11:55.620 | That's a reality.
00:11:56.980 | - But you're more interested in a case where you can.
00:12:00.500 | - I am, yeah.
00:12:02.780 | My sort of approach to AI is
00:12:04.940 | because I've set the goal for myself.
00:12:07.600 | I want machines to be able to ultimately communicate
00:12:10.460 | understanding with humans.
00:12:13.100 | I want them to be able to acquire and communicate,
00:12:15.180 | acquire knowledge from humans
00:12:16.420 | and communicate knowledge to humans.
00:12:18.700 | They should be using what inductive
00:12:23.300 | machine learning techniques are good at,
00:12:25.400 | which is to observe patterns of data,
00:12:28.500 | whether it be in language or whether it be in images
00:12:30.940 | or videos or whatever,
00:12:33.060 | to acquire these patterns,
00:12:37.120 | to induce the generalizations from those patterns,
00:12:41.040 | but then ultimately work with humans
00:12:42.900 | to connect them to frameworks, interpretations, if you will,
00:12:46.340 | that ultimately make sense to humans.
00:12:48.380 | Of course, the machine is gonna have the strength
00:12:50.140 | that it has, the richer, longer memory,
00:12:53.120 | but it has the more rigorous reasoning abilities,
00:12:57.080 | the deeper reasoning abilities,
00:12:58.740 | so it'll be an interesting complementary relationship
00:13:02.780 | between the human and the machine.
00:13:04.900 | - Do you think that ultimately needs explainability
00:13:06.820 | like a machine?
00:13:07.660 | So if we look, we study, for example,
00:13:09.580 | Tesla autopilot a lot, where humans,
00:13:12.500 | I don't know if you've driven the vehicle,
00:13:13.780 | or are aware of what they're doing.
00:13:16.060 | So you're basically, the human and machine
00:13:20.780 | are working together there,
00:13:21.980 | and the human is responsible for their own life
00:13:24.160 | to monitor the system,
00:13:25.880 | and the system fails every few miles.
00:13:30.040 | And so there's hundreds,
00:13:32.160 | there's millions of those failures a day.
00:13:35.280 | And so that's like a moment of interaction.
00:13:37.420 | Do you see--
00:13:38.260 | - Yeah, no, that's exactly right.
00:13:39.560 | That's a moment of interaction
00:13:41.560 | where the machine has learned some stuff,
00:13:45.520 | it has a failure, somehow the failure's communicated,
00:13:50.360 | the human is now filling in the mistake, if you will,
00:13:53.520 | or maybe correcting or doing something
00:13:55.260 | that is more successful in that case,
00:13:57.500 | the computer takes that learning.
00:13:59.520 | So I believe that the collaboration
00:14:01.900 | between human and machine,
00:14:03.940 | I mean, that's sort of a primitive example
00:14:05.540 | and sort of a more,
00:14:06.660 | another example is where the machine's
00:14:10.100 | literally talking to you and saying,
00:14:11.500 | "Look, I'm reading this thing.
00:14:14.320 | "I know that the next word might be this or that,
00:14:18.160 | "but I don't really understand why.
00:14:20.480 | "I have my guess.
00:14:21.540 | "Can you help me understand the framework
00:14:23.540 | "that supports this?"
00:14:25.740 | And then can kind of acquire that,
00:14:27.720 | take that and reason about it and reuse it
00:14:29.820 | the next time it's reading to try to understand something.
00:14:32.180 | Not unlike a human student might do.
00:14:36.420 | I mean, I remember when my daughter was in first grade
00:14:39.140 | and she had a reading assignment about electricity.
00:14:42.920 | And somewhere in the text it says,
00:14:47.220 | "An electricity is produced by water flowing over turbines,"
00:14:50.340 | or something like that.
00:14:51.600 | And then there's a question that says,
00:14:52.900 | "Well, how is electricity created?"
00:14:54.860 | And so my daughter comes to me and says,
00:14:56.860 | "I mean, I could, you know,
00:14:58.160 | "created and produced are kind of synonyms in this case.
00:15:00.840 | "So I can go back to the text
00:15:02.220 | "and I can copy by water flowing over turbines,
00:15:05.300 | "but I have no idea what that means.
00:15:07.760 | "Like, I don't know how to interpret
00:15:09.240 | "water flowing over turbines and what electricity even is.
00:15:12.000 | "I mean, I can get the answer right by matching the text,
00:15:15.620 | "but I don't have any framework for understanding
00:15:17.740 | "what this means at all."
00:15:19.560 | - And framework really is, I mean, it's a set of,
00:15:22.220 | not to be mathematical, but axioms of ideas
00:15:25.860 | that you bring to the table in interpreting stuff
00:15:28.060 | and then you build those up somehow.
00:15:29.980 | - You build them up with the expectation
00:15:32.140 | that there's a shared understanding of what they are.
00:15:35.460 | - Share, yeah, it's the social, the us humans.
00:15:39.500 | Do you have a sense that humans on Earth in general
00:15:43.780 | share a set of, like how many frameworks are there?
00:15:48.220 | - I mean, it depends on how you bound them, right?
00:15:49.900 | So in other words, how big or small
00:15:51.620 | like their individual scope,
00:15:53.500 | but there's lots and there are new ones.
00:15:55.900 | I think the way I think about it is kind of in a layer.
00:15:59.300 | I think of the architecture as being layered in that
00:16:01.740 | there's a small set of primitives
00:16:05.260 | that allow you the foundation to build frameworks.
00:16:07.940 | And then there may be many frameworks,
00:16:10.060 | but you have the ability to acquire them.
00:16:12.300 | And then you have the ability to reuse them.
00:16:14.740 | I mean, one of the most compelling ways of thinking
00:16:17.020 | about this is a reasoning by analogy where I can say,
00:16:19.500 | oh, wow, I've learned something very similar.
00:16:21.740 | I never heard of this game soccer,
00:16:26.940 | but if it's like basketball in the sense that the goal's
00:16:30.340 | like the hoop and I have to get the ball in the hoop
00:16:32.700 | and I have guards and I have this and I have that,
00:16:35.200 | like where are the similarities and where are
00:16:38.460 | the differences and I have a foundation now
00:16:40.820 | for interpreting this new information.
00:16:43.060 | - And then the different groups,
00:16:44.960 | like the millennials will have a framework
00:16:48.060 | and then, you know, Democrats and Republicans.
00:16:53.060 | Millennials, nobody wants that framework.
00:16:55.500 | - Well, I mean, I think--
00:16:56.340 | - Nobody understands it.
00:16:57.260 | - Right, I mean, I think we're talking about political
00:16:58.780 | and social ways of interpreting the world around them.
00:17:01.540 | And I think these frameworks are still largely,
00:17:03.660 | largely similar.
00:17:04.500 | I think they differ in maybe what some fundamental
00:17:07.140 | assumptions and values are.
00:17:09.040 | Now, from a reasoning perspective,
00:17:11.540 | like the ability to process the framework
00:17:13.340 | might not be that different.
00:17:15.820 | The implications of different fundamental values
00:17:18.220 | or fundamental assumptions in those frameworks
00:17:21.140 | may reach very different conclusions.
00:17:23.820 | So from a social perspective,
00:17:26.460 | the conclusions may be very different.
00:17:28.560 | From an intelligence perspective,
00:17:30.100 | I just followed where my assumptions took me.
00:17:33.300 | - Yeah, the process itself will look similar,
00:17:35.100 | but that's a fascinating idea that frameworks
00:17:39.440 | really help carve how a statement will be interpreted.
00:17:44.440 | I mean, having a Democrat and a Republican framework
00:17:50.400 | and then read the exact same statement
00:17:53.840 | and the conclusions that you derive
00:17:55.880 | will be totally different from an AI perspective
00:17:58.240 | is fascinating.
00:17:59.280 | - What we would want out of the AI is to be able to tell you
00:18:02.820 | that this perspective, one perspective,
00:18:05.400 | one set of assumptions is gonna lead you here,
00:18:07.200 | another set of assumptions is gonna lead you there.
00:18:10.360 | And in fact, to help people reason and say,
00:18:13.080 | oh, I see where our differences lie.
00:18:16.880 | I have this fundamental belief about that,
00:18:18.620 | I have this fundamental belief about that.
00:18:20.880 | - Yeah, that's quite brilliant.
00:18:21.760 | From my perspective, NLP, there's this idea
00:18:25.000 | that there's one way to really understand a statement,
00:18:27.760 | but there probably isn't.
00:18:30.440 | There's probably an infinite number of ways
00:18:31.800 | to understand a statement.
00:18:32.640 | - Well, there's lots of different interpretations
00:18:35.100 | and the broader the content, the richer it is.
00:18:40.100 | And so, you and I can have very different experiences
00:18:46.960 | with the same text, obviously.
00:18:49.120 | And if we're committed to understanding each other,
00:18:53.020 | we start, and that's the other important point,
00:18:56.960 | if we're committed to understanding each other,
00:18:59.440 | we start decomposing and breaking down our interpretation
00:19:03.520 | to its more and more primitive components
00:19:05.680 | until we get to that point where we say,
00:19:07.560 | oh, I see why we disagree.
00:19:09.920 | And we try to understand how fundamental
00:19:12.160 | that disagreement really is.
00:19:13.880 | But that requires a commitment to breaking down
00:19:17.260 | that interpretation in terms of that framework
00:19:19.400 | in a logical way.
00:19:20.600 | Otherwise, and this is why I think of AI
00:19:24.440 | as really complementing and helping human intelligence
00:19:27.680 | to overcome some of its biases and its predisposition
00:19:31.560 | to be persuaded by more shallow reasoning
00:19:36.560 | in the sense that we get over this idea,
00:19:38.640 | well, I'm right because I'm Republican
00:19:41.680 | or I'm right because I'm Democratic
00:19:43.040 | and someone labeled this as a Democratic point of view
00:19:45.020 | or it has the following keywords in it.
00:19:47.060 | And if the machine can help us break that argument down
00:19:50.160 | and say, wait a second,
00:19:51.320 | what do you really think about this?
00:19:53.920 | So, essentially, holding us accountable
00:19:57.140 | to doing more critical thinking.
00:19:59.200 | - We're gonna have to sit and think about that as fast.
00:20:01.960 | I love that.
00:20:02.800 | I think that's really empowering use of AI
00:20:05.240 | for the public discourse
00:20:06.520 | that's completely disintegrating currently
00:20:10.240 | as we learn how to do it on social media.
00:20:12.080 | - That's right.
00:20:12.960 | - Thank you.
00:20:13.880 | (audience applauding)
00:20:17.040 | (audience cheering)
00:20:20.040 | (audience cheering)
00:20:23.040 | (audience cheering)
00:20:26.040 | (audience cheering)
00:20:29.040 | [BLANK_AUDIO]