back to index

Yann LeCun: Can Neural Networks Reason? | AI Podcast Clips


Chapters

0:0 Can Neural Networks Reason
0:15 Discrete Reasoning
2:15 Working Memory
3:44 Transformer
5:18 Energy minimization

Whisper Transcript | Transcript Only Page

00:00:00.000 | (gentle music)
00:00:02.580 | - Do you think neural networks can be made to reason?
00:00:10.280 | - Yes, there is no question about that.
00:00:12.320 | Again, we have a good example, right?
00:00:14.160 | The question is how.
00:00:17.000 | So the question is how much prior structure
00:00:19.320 | do you have to put in the neural net
00:00:20.680 | so that something like human reasoning
00:00:22.880 | will emerge from it, from learning?
00:00:26.040 | Another question is all of our kind of model
00:00:29.880 | of what reasoning is that are based on logic
00:00:32.520 | are discrete and are therefore incompatible
00:00:36.400 | with gradient-based learning.
00:00:38.000 | And I'm a very strong believer in this idea
00:00:39.920 | of gradient-based learning.
00:00:41.120 | I don't believe that other types of learning
00:00:44.560 | that don't use kind of gradient information, if you want.
00:00:47.160 | - So you don't like discrete mathematics?
00:00:48.640 | You don't like anything discrete?
00:00:50.240 | - Well, it's not that I don't like it.
00:00:52.160 | It's just that it's incompatible with learning
00:00:54.320 | and I'm a big fan of learning, right?
00:00:56.280 | So in fact, that's perhaps one reason why
00:01:00.240 | deep learning has been kind of looked at with suspicion
00:01:02.800 | by a lot of computer scientists,
00:01:03.840 | because the math is very different.
00:01:05.040 | The math that you use for deep learning,
00:01:07.560 | it kind of has more to do with cybernetics,
00:01:12.320 | the kind of math you do in electrical engineering
00:01:14.800 | than the kind of math you do in computer science.
00:01:17.360 | And nothing in machine learning is exact, right?
00:01:20.800 | Computer science is all about sort of
00:01:23.120 | obsessive-compulsive attention to details
00:01:26.520 | of like every index has to be right.
00:01:28.800 | And you can prove that an algorithm is correct, right?
00:01:31.760 | Machine learning is the science of sloppiness, really.
00:01:35.360 | - That's beautiful.
00:01:38.080 | So, okay, maybe let's feel around in the dark
00:01:43.080 | of what is a neural network that reasons
00:01:46.360 | or a system that works with continuous functions
00:01:51.360 | that's able to do, build knowledge,
00:01:56.960 | however we think about reasoning,
00:01:58.840 | build on previous knowledge, build on extra knowledge,
00:02:02.440 | create new knowledge,
00:02:04.080 | generalize outside of any training set ever built.
00:02:07.640 | What does that look like?
00:02:08.920 | Maybe, do you have inklings of thoughts
00:02:13.320 | of what that might look like?
00:02:15.400 | - Yeah, I mean, yes and no.
00:02:16.880 | If I had precise ideas about this,
00:02:18.760 | I think we'd be building it right now.
00:02:21.840 | And there are people working on this
00:02:23.640 | whose main research interest is actually exactly that, right?
00:02:26.800 | So what you need to have is a working memory.
00:02:29.880 | So you need to have some device, if you want,
00:02:34.480 | some subsystem that can store a relatively large number
00:02:39.120 | of factual episodic information
00:02:41.760 | for a reasonable amount of time.
00:02:45.480 | So in the brain, for example,
00:02:48.440 | there are kind of three main types of memory.
00:02:50.320 | One is the sort of memory of the state of your cortex.
00:02:56.120 | And that sort of disappears within 20 seconds.
00:02:58.280 | You can't remember things for more than about 20 seconds
00:03:00.680 | or a minute if you don't have any other form of memory.
00:03:04.960 | The second type of memory, which is longer term,
00:03:07.120 | but still short term, is the hippocampus.
00:03:08.840 | So you can, you know, you came into this building,
00:03:11.040 | you remember where the exit is, where the elevators are.
00:03:15.560 | You have some map of that building
00:03:18.360 | that's stored in your hippocampus.
00:03:20.280 | You might remember something about what I said.
00:03:23.080 | You know, a few minutes ago.
00:03:24.400 | - I forgot it all already, but it's part.
00:03:25.240 | - It's been erased, but you know,
00:03:26.880 | but that would be in your hippocampus.
00:03:30.200 | And then the longer term memory is in the synapse,
00:03:33.320 | the synapses, right?
00:03:34.640 | So what you need if you want a system
00:03:37.480 | that's capable of reasoning
00:03:38.320 | is that you want the hippocampus like thing, right?
00:03:41.440 | And that's what people have tried to do
00:03:44.560 | with memory networks and, you know,
00:03:46.520 | neural training machines and stuff like that, right?
00:03:48.520 | And now with transformers, which have sort of
00:03:51.360 | a memory in their kind of self-attention system,
00:03:54.640 | you can think of it this way.
00:03:56.000 | So that's one element you need.
00:03:59.800 | Another thing you need is some sort of network
00:04:02.600 | that can access this memory,
00:04:05.720 | get an information back, and then kind of crunch on it,
00:04:10.680 | and then do this iteratively multiple times,
00:04:13.400 | because a chain of reasoning is a process
00:04:18.400 | by which you update your knowledge
00:04:23.400 | about the state of the world,
00:04:24.960 | about what's going to happen, et cetera.
00:04:27.360 | And that has to be this sort of
00:04:30.000 | recurrent operation, basically.
00:04:31.680 | - And you think that kind of,
00:04:33.720 | if we think about a transformer,
00:04:35.680 | so that seems to be too small to contain the knowledge
00:04:38.560 | that's to represent the knowledge
00:04:41.840 | that's contained in Wikipedia, for example.
00:04:43.800 | - Well, a transformer doesn't have this idea of recurrence.
00:04:46.560 | It's got a fixed number of layers,
00:04:47.640 | and that's the number of steps that limits,
00:04:50.160 | basically, its representation.
00:04:51.680 | - But recurrence would build on the knowledge somehow.
00:04:55.800 | I mean, it would evolve the knowledge
00:04:59.280 | and expand the amount of information, perhaps,
00:05:02.600 | or useful information within that knowledge.
00:05:04.880 | But is this something that just can emerge with size?
00:05:09.320 | Because it seems like everything we have now is too small.
00:05:11.040 | - Not just.
00:05:11.880 | No, it's not clear.
00:05:13.880 | I mean, how you access and write
00:05:15.720 | into an associative memory inefficient
00:05:17.600 | way, I mean, sort of the original memory network
00:05:19.760 | maybe had something like the right architecture,
00:05:22.080 | but if you try to scale up a memory network
00:05:25.080 | so that the memory contains all of Wikipedia,
00:05:27.400 | it doesn't quite work.
00:05:28.560 | - Right.
00:05:29.640 | - So there's a need for new ideas there, okay.
00:05:33.200 | But it's not the only form of reasoning.
00:05:34.520 | So there's another form of reasoning,
00:05:35.920 | which is very classical,
00:05:38.400 | so in some types of AI,
00:05:41.600 | and it's based on, let's call it energy minimization.
00:05:45.440 | Okay, so you have some sort of objective,
00:05:49.480 | some energy function that represents
00:05:51.360 | the quality or the negative quality, okay.
00:05:57.840 | Energy goes up when things get bad
00:05:59.240 | and they get low when things get good.
00:06:01.840 | So let's say you want to figure out,
00:06:04.960 | what gestures do I need to do
00:06:07.480 | to grab an object or walk out the door?
00:06:11.760 | If you have a good model of your own body,
00:06:14.840 | a good model of the environment,
00:06:17.000 | using this kind of energy minimization,
00:06:18.960 | you can do planning.
00:06:21.440 | And in optimal control,
00:06:23.760 | it's called model predictive control.
00:06:26.640 | You have a model of what's gonna happen in the world
00:06:28.640 | as a consequence of your actions.
00:06:30.080 | And that allows you to, by energy minimization,
00:06:33.120 | figure out a sequence of action
00:06:34.360 | that optimizes a particular objective function,
00:06:36.640 | which minimizes the number of times
00:06:38.720 | you're gonna hit something
00:06:39.560 | and the energy you're gonna spend
00:06:41.080 | doing the gesture and et cetera.
00:06:44.360 | So that's a form of reasoning.
00:06:47.000 | Planning is a form of reasoning.
00:06:48.080 | And perhaps what led to the ability of humans to reason
00:06:52.600 | is the fact that, or species that appear before us
00:06:57.600 | had to do some sort of planning
00:06:59.600 | to be able to hunt and survive
00:07:01.520 | and survive the winter in particular.
00:07:04.160 | And so it's the same capacity that you need to have.
00:07:07.880 | - So in your intuition is,
00:07:11.000 | if we look at expert systems
00:07:13.840 | and encoding knowledge as logic systems,
00:07:17.760 | as graphs or in this kind of way,
00:07:21.280 | is not a useful way to think about knowledge?
00:07:24.840 | - Graphs are a little brittle or logic representation.
00:07:28.520 | So basically, variables that have values
00:07:32.440 | and then constraint between them
00:07:33.840 | that are represented by rules
00:07:35.840 | is a little too rigid and too brittle, right?
00:07:37.400 | So some of the early efforts in that respect
00:07:43.200 | were to put probabilities on them.
00:07:45.560 | So a rule, if you have this and that symptom,
00:07:49.120 | you have this disease with that probability
00:07:51.760 | and you should prescribe that antibiotic
00:07:53.960 | with that probability, right?
00:07:55.080 | That's the mycin system from the seventies.
00:07:57.840 | And that's what that branch of AI led to
00:08:02.080 | Bayesian networks and graphical models
00:08:04.840 | and causal inference and variational method.
00:08:09.480 | So there is, I mean, certainly a lot of interesting work
00:08:14.480 | going on in this area.
00:08:15.960 | The main issue with this is knowledge acquisition.
00:08:18.400 | How do you reduce a bunch of data to a graph of this type?
00:08:23.400 | - Yeah, it relies on the expert to,
00:08:26.360 | on the human being to encode, to add knowledge.
00:08:29.480 | - And that's essentially impractical.
00:08:31.640 | - Yeah, it's not scalable.
00:08:34.000 | - That's a big question.
00:08:34.840 | The second question is,
00:08:35.960 | do you want to represent knowledge as symbols
00:08:39.160 | and do you want to manipulate them with logic?
00:08:41.760 | And again, that's incompatible with learning.
00:08:43.840 | So one suggestion with Geoff Hinton
00:08:47.680 | has been advocating for many decades
00:08:49.560 | is replace symbols by vectors.
00:08:53.840 | Think of it as pattern of activities
00:08:55.480 | in a bunch of neurons or units
00:08:57.840 | or whatever you want to call them
00:08:59.640 | and replace logic by continuous functions.
00:09:03.280 | Okay, and that becomes now compatible.
00:09:06.360 | There's a very good set of ideas
00:09:09.480 | by written in a paper about 10 years ago
00:09:12.160 | by Leon Boutou on, who is here at Facebook.
00:09:15.520 | The title of the paper is
00:09:18.920 | from machine learning to machine reasoning.
00:09:20.360 | And his idea is that a learning system
00:09:24.000 | should be able to manipulate objects
00:09:25.400 | that are in a space
00:09:27.680 | and then put the result back in the same space.
00:09:29.440 | So it's this idea of working memory basically.
00:09:31.720 | And it's very enlightening.
00:09:35.200 | - And in a sense that might learn something
00:09:38.320 | like the simple expert systems.
00:09:41.480 | I mean, you can learn basic logic operations there.
00:09:46.640 | - Yeah, quite possibly.
00:09:47.960 | There's a big debate on sort of how much prior structure
00:09:51.240 | you have to put in for this kind of stuff to emerge.
00:09:53.640 | That's the debate I have with Gary Marcus
00:09:55.280 | and people like that.
00:09:56.320 | (upbeat music)
00:09:58.920 | (upbeat music)
00:10:01.520 | (upbeat music)
00:10:04.120 | (upbeat music)
00:10:06.720 | (upbeat music)
00:10:09.320 | (upbeat music)
00:10:11.920 | [BLANK_AUDIO]