Back to Index

Yann LeCun: Can Neural Networks Reason? | AI Podcast Clips


Chapters

0:0 Can Neural Networks Reason
0:15 Discrete Reasoning
2:15 Working Memory
3:44 Transformer
5:18 Energy minimization

Transcript

(gentle music) - Do you think neural networks can be made to reason? - Yes, there is no question about that. Again, we have a good example, right? The question is how. So the question is how much prior structure do you have to put in the neural net so that something like human reasoning will emerge from it, from learning?

Another question is all of our kind of model of what reasoning is that are based on logic are discrete and are therefore incompatible with gradient-based learning. And I'm a very strong believer in this idea of gradient-based learning. I don't believe that other types of learning that don't use kind of gradient information, if you want.

- So you don't like discrete mathematics? You don't like anything discrete? - Well, it's not that I don't like it. It's just that it's incompatible with learning and I'm a big fan of learning, right? So in fact, that's perhaps one reason why deep learning has been kind of looked at with suspicion by a lot of computer scientists, because the math is very different.

The math that you use for deep learning, it kind of has more to do with cybernetics, the kind of math you do in electrical engineering than the kind of math you do in computer science. And nothing in machine learning is exact, right? Computer science is all about sort of obsessive-compulsive attention to details of like every index has to be right.

And you can prove that an algorithm is correct, right? Machine learning is the science of sloppiness, really. - That's beautiful. So, okay, maybe let's feel around in the dark of what is a neural network that reasons or a system that works with continuous functions that's able to do, build knowledge, however we think about reasoning, build on previous knowledge, build on extra knowledge, create new knowledge, generalize outside of any training set ever built.

What does that look like? Maybe, do you have inklings of thoughts of what that might look like? - Yeah, I mean, yes and no. If I had precise ideas about this, I think we'd be building it right now. And there are people working on this whose main research interest is actually exactly that, right?

So what you need to have is a working memory. So you need to have some device, if you want, some subsystem that can store a relatively large number of factual episodic information for a reasonable amount of time. So in the brain, for example, there are kind of three main types of memory.

One is the sort of memory of the state of your cortex. And that sort of disappears within 20 seconds. You can't remember things for more than about 20 seconds or a minute if you don't have any other form of memory. The second type of memory, which is longer term, but still short term, is the hippocampus.

So you can, you know, you came into this building, you remember where the exit is, where the elevators are. You have some map of that building that's stored in your hippocampus. You might remember something about what I said. You know, a few minutes ago. - I forgot it all already, but it's part.

- It's been erased, but you know, but that would be in your hippocampus. And then the longer term memory is in the synapse, the synapses, right? So what you need if you want a system that's capable of reasoning is that you want the hippocampus like thing, right? And that's what people have tried to do with memory networks and, you know, neural training machines and stuff like that, right?

And now with transformers, which have sort of a memory in their kind of self-attention system, you can think of it this way. So that's one element you need. Another thing you need is some sort of network that can access this memory, get an information back, and then kind of crunch on it, and then do this iteratively multiple times, because a chain of reasoning is a process by which you update your knowledge about the state of the world, about what's going to happen, et cetera.

And that has to be this sort of recurrent operation, basically. - And you think that kind of, if we think about a transformer, so that seems to be too small to contain the knowledge that's to represent the knowledge that's contained in Wikipedia, for example. - Well, a transformer doesn't have this idea of recurrence.

It's got a fixed number of layers, and that's the number of steps that limits, basically, its representation. - But recurrence would build on the knowledge somehow. I mean, it would evolve the knowledge and expand the amount of information, perhaps, or useful information within that knowledge. But is this something that just can emerge with size?

Because it seems like everything we have now is too small. - Not just. No, it's not clear. I mean, how you access and write into an associative memory inefficient way, I mean, sort of the original memory network maybe had something like the right architecture, but if you try to scale up a memory network so that the memory contains all of Wikipedia, it doesn't quite work.

- Right. - So there's a need for new ideas there, okay. But it's not the only form of reasoning. So there's another form of reasoning, which is very classical, so in some types of AI, and it's based on, let's call it energy minimization. Okay, so you have some sort of objective, some energy function that represents the quality or the negative quality, okay.

Energy goes up when things get bad and they get low when things get good. So let's say you want to figure out, what gestures do I need to do to grab an object or walk out the door? If you have a good model of your own body, a good model of the environment, using this kind of energy minimization, you can do planning.

And in optimal control, it's called model predictive control. You have a model of what's gonna happen in the world as a consequence of your actions. And that allows you to, by energy minimization, figure out a sequence of action that optimizes a particular objective function, which minimizes the number of times you're gonna hit something and the energy you're gonna spend doing the gesture and et cetera.

So that's a form of reasoning. Planning is a form of reasoning. And perhaps what led to the ability of humans to reason is the fact that, or species that appear before us had to do some sort of planning to be able to hunt and survive and survive the winter in particular.

And so it's the same capacity that you need to have. - So in your intuition is, if we look at expert systems and encoding knowledge as logic systems, as graphs or in this kind of way, is not a useful way to think about knowledge? - Graphs are a little brittle or logic representation.

So basically, variables that have values and then constraint between them that are represented by rules is a little too rigid and too brittle, right? So some of the early efforts in that respect were to put probabilities on them. So a rule, if you have this and that symptom, you have this disease with that probability and you should prescribe that antibiotic with that probability, right?

That's the mycin system from the seventies. And that's what that branch of AI led to Bayesian networks and graphical models and causal inference and variational method. So there is, I mean, certainly a lot of interesting work going on in this area. The main issue with this is knowledge acquisition.

How do you reduce a bunch of data to a graph of this type? - Yeah, it relies on the expert to, on the human being to encode, to add knowledge. - And that's essentially impractical. - Yeah, it's not scalable. - That's a big question. The second question is, do you want to represent knowledge as symbols and do you want to manipulate them with logic?

And again, that's incompatible with learning. So one suggestion with Geoff Hinton has been advocating for many decades is replace symbols by vectors. Think of it as pattern of activities in a bunch of neurons or units or whatever you want to call them and replace logic by continuous functions. Okay, and that becomes now compatible.

There's a very good set of ideas by written in a paper about 10 years ago by Leon Boutou on, who is here at Facebook. The title of the paper is from machine learning to machine reasoning. And his idea is that a learning system should be able to manipulate objects that are in a space and then put the result back in the same space.

So it's this idea of working memory basically. And it's very enlightening. - And in a sense that might learn something like the simple expert systems. I mean, you can learn basic logic operations there. - Yeah, quite possibly. There's a big debate on sort of how much prior structure you have to put in for this kind of stuff to emerge.

That's the debate I have with Gary Marcus and people like that. (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music)