Yann LeCun: Can Neural Networks Reason?

00:00:00.000 | (gentle music)

00:00:02.580 | - Do you think neural networks can be made to reason?

00:00:10.280 | - Yes, there is no question about that.

00:00:12.320 | Again, we have a good example, right?

00:00:14.160 | The question is how.

00:00:17.000 | So the question is how much prior structure

00:00:19.320 | do you have to put in the neural net

00:00:20.680 | so that something like human reasoning

00:00:22.880 | will emerge from it, from learning?

00:00:26.040 | Another question is all of our kind of model

00:00:29.880 | of what reasoning is that are based on logic

00:00:32.520 | are discrete and are therefore incompatible

00:00:36.400 | with gradient-based learning.

00:00:38.000 | And I'm a very strong believer in this idea

00:00:39.920 | of gradient-based learning.

00:00:41.120 | I don't believe that other types of learning

00:00:44.560 | that don't use kind of gradient information, if you want.

00:00:47.160 | - So you don't like discrete mathematics?

00:00:48.640 | You don't like anything discrete?

00:00:50.240 | - Well, it's not that I don't like it.

00:00:52.160 | It's just that it's incompatible with learning

00:00:54.320 | and I'm a big fan of learning, right?

00:00:56.280 | So in fact, that's perhaps one reason why

00:01:00.240 | deep learning has been kind of looked at with suspicion

00:01:02.800 | by a lot of computer scientists,

00:01:03.840 | because the math is very different.

00:01:05.040 | The math that you use for deep learning,

00:01:07.560 | it kind of has more to do with cybernetics,

00:01:12.320 | the kind of math you do in electrical engineering

00:01:14.800 | than the kind of math you do in computer science.

00:01:17.360 | And nothing in machine learning is exact, right?

00:01:20.800 | Computer science is all about sort of

00:01:23.120 | obsessive-compulsive attention to details

00:01:26.520 | of like every index has to be right.

00:01:28.800 | And you can prove that an algorithm is correct, right?

00:01:31.760 | Machine learning is the science of sloppiness, really.

00:01:35.360 | - That's beautiful.

00:01:38.080 | So, okay, maybe let's feel around in the dark

00:01:43.080 | of what is a neural network that reasons

00:01:46.360 | or a system that works with continuous functions

00:01:51.360 | that's able to do, build knowledge,

00:01:56.960 | however we think about reasoning,

00:01:58.840 | build on previous knowledge, build on extra knowledge,

00:02:02.440 | create new knowledge,

00:02:04.080 | generalize outside of any training set ever built.

00:02:07.640 | What does that look like?

00:02:08.920 | Maybe, do you have inklings of thoughts

00:02:13.320 | of what that might look like?

00:02:15.400 | - Yeah, I mean, yes and no.

00:02:16.880 | If I had precise ideas about this,

00:02:18.760 | I think we'd be building it right now.

00:02:21.840 | And there are people working on this

00:02:23.640 | whose main research interest is actually exactly that, right?

00:02:26.800 | So what you need to have is a working memory.

00:02:29.880 | So you need to have some device, if you want,

00:02:34.480 | some subsystem that can store a relatively large number

00:02:39.120 | of factual episodic information

00:02:41.760 | for a reasonable amount of time.

00:02:45.480 | So in the brain, for example,

00:02:48.440 | there are kind of three main types of memory.

00:02:50.320 | One is the sort of memory of the state of your cortex.

00:02:56.120 | And that sort of disappears within 20 seconds.

00:02:58.280 | You can't remember things for more than about 20 seconds

00:03:00.680 | or a minute if you don't have any other form of memory.

00:03:04.960 | The second type of memory, which is longer term,

00:03:07.120 | but still short term, is the hippocampus.

00:03:08.840 | So you can, you know, you came into this building,

00:03:11.040 | you remember where the exit is, where the elevators are.

00:03:15.560 | You have some map of that building

00:03:18.360 | that's stored in your hippocampus.

00:03:20.280 | You might remember something about what I said.

00:03:23.080 | You know, a few minutes ago.

00:03:24.400 | - I forgot it all already, but it's part.

00:03:25.240 | - It's been erased, but you know,

00:03:26.880 | but that would be in your hippocampus.

00:03:30.200 | And then the longer term memory is in the synapse,

00:03:33.320 | the synapses, right?

00:03:34.640 | So what you need if you want a system

00:03:37.480 | that's capable of reasoning

00:03:38.320 | is that you want the hippocampus like thing, right?

00:03:41.440 | And that's what people have tried to do

00:03:44.560 | with memory networks and, you know,

00:03:46.520 | neural training machines and stuff like that, right?

00:03:48.520 | And now with transformers, which have sort of

00:03:51.360 | a memory in their kind of self-attention system,

00:03:54.640 | you can think of it this way.

00:03:56.000 | So that's one element you need.

00:03:59.800 | Another thing you need is some sort of network

00:04:02.600 | that can access this memory,

00:04:05.720 | get an information back, and then kind of crunch on it,

00:04:10.680 | and then do this iteratively multiple times,

00:04:13.400 | because a chain of reasoning is a process

00:04:18.400 | by which you update your knowledge

00:04:23.400 | about the state of the world,

00:04:24.960 | about what's going to happen, et cetera.

00:04:27.360 | And that has to be this sort of

00:04:30.000 | recurrent operation, basically.

00:04:31.680 | - And you think that kind of,

00:04:33.720 | if we think about a transformer,

00:04:35.680 | so that seems to be too small to contain the knowledge

00:04:38.560 | that's to represent the knowledge

00:04:41.840 | that's contained in Wikipedia, for example.

00:04:43.800 | - Well, a transformer doesn't have this idea of recurrence.

00:04:46.560 | It's got a fixed number of layers,

00:04:47.640 | and that's the number of steps that limits,

00:04:50.160 | basically, its representation.

00:04:51.680 | - But recurrence would build on the knowledge somehow.

00:04:55.800 | I mean, it would evolve the knowledge

00:04:59.280 | and expand the amount of information, perhaps,

00:05:02.600 | or useful information within that knowledge.

00:05:04.880 | But is this something that just can emerge with size?

00:05:09.320 | Because it seems like everything we have now is too small.

00:05:11.040 | - Not just.

00:05:11.880 | No, it's not clear.

00:05:13.880 | I mean, how you access and write

00:05:15.720 | into an associative memory inefficient

00:05:17.600 | way, I mean, sort of the original memory network

00:05:19.760 | maybe had something like the right architecture,

00:05:22.080 | but if you try to scale up a memory network

00:05:25.080 | so that the memory contains all of Wikipedia,

00:05:27.400 | it doesn't quite work.

00:05:28.560 | - Right.

00:05:29.640 | - So there's a need for new ideas there, okay.

00:05:33.200 | But it's not the only form of reasoning.

00:05:34.520 | So there's another form of reasoning,

00:05:35.920 | which is very classical,

00:05:38.400 | so in some types of AI,

00:05:41.600 | and it's based on, let's call it energy minimization.

00:05:45.440 | Okay, so you have some sort of objective,

00:05:49.480 | some energy function that represents

00:05:51.360 | the quality or the negative quality, okay.

00:05:57.840 | Energy goes up when things get bad

00:05:59.240 | and they get low when things get good.

00:06:01.840 | So let's say you want to figure out,

00:06:04.960 | what gestures do I need to do

00:06:07.480 | to grab an object or walk out the door?

00:06:11.760 | If you have a good model of your own body,

00:06:14.840 | a good model of the environment,

00:06:17.000 | using this kind of energy minimization,

00:06:18.960 | you can do planning.

00:06:21.440 | And in optimal control,

00:06:23.760 | it's called model predictive control.

00:06:26.640 | You have a model of what's gonna happen in the world

00:06:28.640 | as a consequence of your actions.

00:06:30.080 | And that allows you to, by energy minimization,

00:06:33.120 | figure out a sequence of action

00:06:34.360 | that optimizes a particular objective function,

00:06:36.640 | which minimizes the number of times

00:06:38.720 | you're gonna hit something

00:06:39.560 | and the energy you're gonna spend

00:06:41.080 | doing the gesture and et cetera.

00:06:44.360 | So that's a form of reasoning.

00:06:47.000 | Planning is a form of reasoning.

00:06:48.080 | And perhaps what led to the ability of humans to reason

00:06:52.600 | is the fact that, or species that appear before us

00:06:57.600 | had to do some sort of planning

00:06:59.600 | to be able to hunt and survive

00:07:01.520 | and survive the winter in particular.

00:07:04.160 | And so it's the same capacity that you need to have.

00:07:07.880 | - So in your intuition is,

00:07:11.000 | if we look at expert systems

00:07:13.840 | and encoding knowledge as logic systems,

00:07:17.760 | as graphs or in this kind of way,

00:07:21.280 | is not a useful way to think about knowledge?

00:07:24.840 | - Graphs are a little brittle or logic representation.

00:07:28.520 | So basically, variables that have values

00:07:32.440 | and then constraint between them

00:07:33.840 | that are represented by rules

00:07:35.840 | is a little too rigid and too brittle, right?

00:07:37.400 | So some of the early efforts in that respect

00:07:43.200 | were to put probabilities on them.

00:07:45.560 | So a rule, if you have this and that symptom,

00:07:49.120 | you have this disease with that probability

00:07:51.760 | and you should prescribe that antibiotic

00:07:53.960 | with that probability, right?

00:07:55.080 | That's the mycin system from the seventies.

00:07:57.840 | And that's what that branch of AI led to

00:08:02.080 | Bayesian networks and graphical models

00:08:04.840 | and causal inference and variational method.

00:08:09.480 | So there is, I mean, certainly a lot of interesting work

00:08:14.480 | going on in this area.

00:08:15.960 | The main issue with this is knowledge acquisition.

00:08:18.400 | How do you reduce a bunch of data to a graph of this type?

00:08:23.400 | - Yeah, it relies on the expert to,

00:08:26.360 | on the human being to encode, to add knowledge.

00:08:29.480 | - And that's essentially impractical.

00:08:31.640 | - Yeah, it's not scalable.

00:08:34.000 | - That's a big question.

00:08:34.840 | The second question is,

00:08:35.960 | do you want to represent knowledge as symbols

00:08:39.160 | and do you want to manipulate them with logic?

00:08:41.760 | And again, that's incompatible with learning.

00:08:43.840 | So one suggestion with Geoff Hinton

00:08:47.680 | has been advocating for many decades

00:08:49.560 | is replace symbols by vectors.

00:08:53.840 | Think of it as pattern of activities

00:08:55.480 | in a bunch of neurons or units

00:08:57.840 | or whatever you want to call them

00:08:59.640 | and replace logic by continuous functions.

00:09:03.280 | Okay, and that becomes now compatible.

00:09:06.360 | There's a very good set of ideas

00:09:09.480 | by written in a paper about 10 years ago

00:09:12.160 | by Leon Boutou on, who is here at Facebook.

00:09:15.520 | The title of the paper is

00:09:18.920 | from machine learning to machine reasoning.

00:09:20.360 | And his idea is that a learning system

00:09:24.000 | should be able to manipulate objects

00:09:25.400 | that are in a space

00:09:27.680 | and then put the result back in the same space.

00:09:29.440 | So it's this idea of working memory basically.

00:09:31.720 | And it's very enlightening.

00:09:35.200 | - And in a sense that might learn something

00:09:38.320 | like the simple expert systems.

00:09:41.480 | I mean, you can learn basic logic operations there.

00:09:46.640 | - Yeah, quite possibly.

00:09:47.960 | There's a big debate on sort of how much prior structure

00:09:51.240 | you have to put in for this kind of stuff to emerge.

00:09:53.640 | That's the debate I have with Gary Marcus

00:09:55.280 | and people like that.

00:09:56.320 | (upbeat music)

00:09:58.920 | (upbeat music)

00:10:01.520 | (upbeat music)

00:10:04.120 | (upbeat music)

00:10:06.720 | (upbeat music)

00:10:09.320 | (upbeat music)

00:10:11.920 | [BLANK_AUDIO]

Yann LeCun: Can Neural Networks Reason? | AI Podcast Clips

Chapters