Yann LeCun: Benchmarks for Human-Level Intelligence

00:00:00.000 | (gentle music)

00:00:02.580 | - You've written advice saying,

00:00:09.720 | "Don't get fooled by people who claim

00:00:12.120 | "to have a solution to artificial general intelligence,

00:00:14.740 | "who claim to have an AI system

00:00:16.120 | "that works just like the human brain,

00:00:18.460 | "or who claim to have figured out how the brain works.

00:00:21.240 | "Ask them what the error rate they get

00:00:25.140 | "on MNIST or ImageNet."

00:00:27.360 | - Yeah, this is a little dated, by the way. (laughs)

00:00:29.640 | I mean, five years, who's counting?

00:00:32.520 | Okay, but I think your opinion is still,

00:00:35.120 | MNIST and ImageNet, yes, may be dated,

00:00:39.160 | there may be new benchmarks, right?

00:00:40.600 | But I think that philosophy is one you still

00:00:43.600 | and somewhat hold, that benchmarks

00:00:47.640 | and the practical testing, the practical application

00:00:49.980 | is where you really get to test the ideas.

00:00:52.240 | - Well, it may not be completely practical.

00:00:54.040 | Like, for example, it could be a toy dataset,

00:00:56.680 | but it has to be some sort of task

00:00:59.080 | that the community as a whole has accepted

00:01:01.520 | as some sort of standard kind of benchmark, if you want.

00:01:04.840 | It doesn't need to be real.

00:01:05.680 | So for example, many years ago here at FAIR,

00:01:08.520 | people, Jason West, Antoine Born, and a few others

00:01:11.880 | proposed the Babi tasks, which were kind of a toy problem

00:01:15.380 | to test the ability of machines to reason, actually,

00:01:18.560 | to access working memory and things like this.

00:01:21.180 | And it was very useful, even though it wasn't a real task.

00:01:24.360 | MNIST is kind of halfway a real task.

00:01:27.880 | So, you know, toy problems can be very useful.

00:01:30.280 | It's just that I was really struck by the fact that

00:01:33.800 | a lot of people, particularly a lot of people

00:01:35.400 | with money to invest, would be fooled by people telling them,

00:01:38.640 | oh, we have, you know, the algorithm of the cortex

00:01:41.640 | and you should give us 50 million.

00:01:43.620 | - Yes, absolutely.

00:01:44.460 | So there's a lot of people who try to take advantage

00:01:49.460 | of the hype for business reasons and so on.

00:01:52.480 | But let me sort of talk to this idea

00:01:56.080 | that the new ideas, the ideas that push the field forward

00:02:00.360 | may not yet have a benchmark,

00:02:02.880 | or it may be very difficult to establish a benchmark.

00:02:05.120 | - I agree.

00:02:05.960 | That's part of the process.

00:02:06.800 | Establishing benchmarks is part of the process.

00:02:08.840 | - So what are your thoughts about,

00:02:11.560 | so we have these benchmarks on,

00:02:13.880 | around stuff we can do with images,

00:02:16.520 | from classification to captioning,

00:02:19.160 | to just every kind of information you can pull off

00:02:21.180 | from images and the surface level.

00:02:23.120 | There's audio data sets, there's some video.

00:02:25.680 | What can we start, natural language,

00:02:29.200 | what kind of stuff, what kind of benchmarks do you see

00:02:32.960 | that start creeping on to more something like intelligence,

00:02:37.880 | like reasoning, like, maybe you don't like the term,

00:02:41.680 | but AGI, echoes of that kind of formulation?

00:02:44.720 | - Yeah, so a lot of people are working on

00:02:47.120 | interactive environments in which you can train

00:02:50.360 | and test intelligent systems.

00:02:52.360 | So there, for example,

00:02:54.460 | the classical paradigm of supervised learning

00:03:00.400 | is that you have a data set,

00:03:02.200 | you partition it into a training set,

00:03:03.720 | validation set, test set,

00:03:04.720 | and there's a clear protocol, right?

00:03:07.280 | But what if, that assumes that the samples

00:03:10.660 | are statistically independent,

00:03:13.120 | you can exchange them,

00:03:14.360 | the order in which you see them shouldn't matter,

00:03:16.520 | things like that.

00:03:17.720 | But what if the answer you give

00:03:19.800 | determines the next sample you see,

00:03:21.840 | which is the case, for example, in robotics, right?

00:03:23.840 | You robot does something

00:03:25.440 | and then it gets exposed to a new room,

00:03:27.880 | and depending on where it goes,

00:03:29.380 | the room would be different.

00:03:30.240 | So that creates the exploration problem.

00:03:32.700 | The what if, the samples,

00:03:37.240 | so that creates also a dependency between samples, right?

00:03:40.040 | If you can only move in space,

00:03:43.880 | the next sample you're gonna see

00:03:45.220 | is gonna be probably in the same building, most likely.

00:03:49.760 | So all the assumptions about the validity

00:03:52.200 | of this training set, test set hypothesis break.

00:03:55.920 | Whenever a machine can take an action

00:03:57.360 | that has an influence in the world,

00:03:59.200 | and it's what it's gonna see.

00:04:00.640 | So people are setting up artificial environments

00:04:04.420 | where that takes place, right?

00:04:06.320 | The robot runs around a 3D model of a house

00:04:10.100 | and can interact with objects and things like this.

00:04:12.960 | So you do robotics by simulation,

00:04:14.640 | you have those, you know,

00:04:17.200 | a bunny eye gym type thing,

00:04:18.680 | or a MuJoCo kind of simulated robots,

00:04:23.080 | and you have games, you know, things like that.

00:04:25.520 | So that's where the field is going, really,

00:04:27.900 | this kind of environment.

00:04:29.160 | Now, back to the question of AGI,

00:04:32.600 | like, I don't like the term AGI,

00:04:34.200 | because it implies that human intelligence is general,

00:04:40.040 | and human intelligence is nothing like general,

00:04:42.640 | it's very, very specialized.

00:04:45.120 | We think it's general,

00:04:45.960 | we like to think of ourselves as having general intelligence,

00:04:48.080 | we don't, we're very specialized.

00:04:50.360 | We're only slightly more general than--

00:04:51.800 | - Why does it feel general?

00:04:53.160 | So you kind of, the term general,

00:04:56.300 | I think what's impressive about humans

00:04:58.480 | is the ability to learn,

00:05:00.600 | as we were talking about learning,

00:05:02.520 | to learn in just so many different domains.

00:05:05.520 | It's perhaps not arbitrarily general,

00:05:08.680 | but just you can learn in many domains

00:05:10.680 | and integrate that knowledge somehow.

00:05:12.480 | - Okay. - The knowledge persists.

00:05:14.120 | - So let me take a very specific example.

00:05:16.480 | It's not an example,

00:05:17.320 | it's more like a quasi-mathematical demonstration.

00:05:21.360 | So you have about one million fibers

00:05:22.800 | coming out of one of your eyes,

00:05:24.680 | okay, two million total,

00:05:25.600 | but let's talk about just one of them.

00:05:27.720 | It's one million nerve fibers, your optical nerve.

00:05:30.320 | Let's imagine that they are binary,

00:05:33.040 | so they can be active or inactive, right?

00:05:34.880 | So the input to your visual cortex is one million bits.

00:05:38.300 | Now, they're connected to your brain in a particular way,

00:05:43.640 | and your brain has connections

00:05:46.200 | that are kind of a little bit like a convolutional net,

00:05:48.400 | they're kind of local, you know,

00:05:50.520 | in space and things like this.

00:05:52.200 | Now imagine I play a trick on you.

00:05:53.960 | It's a pretty nasty trick, I admit.

00:05:57.280 | I cut your optical nerve,

00:05:59.960 | and I put a device that makes a random perturbation,

00:06:03.400 | a permutation of all the nerve fibers.

00:06:05.360 | So now what comes to your brain

00:06:08.840 | is a fixed but random permutation of all the pixels.

00:06:13.400 | There's no way in hell that your visual cortex,

00:06:15.620 | even if I do this to you in infancy,

00:06:19.000 | will actually learn vision

00:06:20.760 | to the same level of quality that you can.

00:06:24.280 | - Got it, and you're saying

00:06:25.360 | there's no way you've relearned that?

00:06:26.960 | - No, because now two pixels that are nearby in the world

00:06:29.880 | will end up in very different places in your visual cortex.

00:06:33.480 | And your neurons there have no connections with each other

00:06:35.880 | because they only connected locally.

00:06:37.740 | - So this whole, our entire,

00:06:39.280 | the hardware is built in many ways to support?

00:06:42.860 | - The locality of the real world.

00:06:44.440 | Yes, that's specialization.

00:06:46.840 | - Yeah, but it's still pretty damn impressive.

00:06:48.840 | So it's not perfect generalization.

00:06:50.480 | It's not even close.

00:06:51.320 | - No, no, it's not that it's not even close.

00:06:54.240 | It's not at all.

00:06:55.240 | - Yeah, it's not, it's specialized.

00:06:56.480 | - So how many Boolean functions?

00:06:58.280 | So let's imagine you want to train your visual system

00:07:02.520 | to recognize particular patterns of those one million bits.

00:07:07.520 | Okay, so that's a Boolean function, right?

00:07:10.040 | Either the pattern is here or not here.

00:07:11.320 | It's a two way classification

00:07:13.460 | with one million binary inputs.

00:07:15.940 | How many such Boolean functions are there?

00:07:20.540 | Okay, you have two to the one million

00:07:23.280 | combinations of inputs.

00:07:25.440 | For each of those, you have an output bit.

00:07:28.320 | And so you have two to the one million

00:07:31.080 | Boolean functions of this type, okay?

00:07:34.300 | Which is an unimaginably large number.

00:07:37.280 | How many of those functions

00:07:38.800 | can actually be computed by your visual cortex?

00:07:41.520 | And the answer is a tiny, tiny, tiny, tiny, tiny, tiny sliver

00:07:45.720 | like an enormously tiny sliver.

00:07:47.760 | - Yeah, yeah.

00:07:49.240 | - So we are ridiculously specialized.

00:07:51.560 | - But, okay, that's an argument against the word general.

00:07:58.120 | I think there's a, I agree with your intuition,

00:08:03.400 | but I'm not sure it's,

00:08:05.200 | it seems the brain is impressively

00:08:09.040 | capable of adjusting to things.

00:08:13.040 | So--

00:08:13.880 | - It's because we can't imagine tasks

00:08:17.640 | that are outside of our comprehension, right?

00:08:20.480 | So we think we are general

00:08:22.280 | because we are general of all the things

00:08:23.520 | that we can apprehend.

00:08:25.000 | But there is a huge world out there

00:08:27.240 | of things that we have no idea.

00:08:28.960 | We call that heat, by the way.

00:08:31.080 | - Heat. - Heat.

00:08:31.980 | So, at least physicists call that heat,

00:08:34.880 | or they call it entropy, which is--

00:08:36.040 | - Entropy.

00:08:36.880 | - You have a thing full of gas, right?

00:08:42.800 | - Closed system full of gas.

00:08:44.960 | - Right?

00:08:45.960 | Closed or not closed.

00:08:46.840 | It has pressure, it has temperature,

00:08:51.840 | it has, and you can write equations,

00:08:56.000 | PV equal NRT, things like that, right?

00:08:58.400 | When you reduce the volume, the temperature goes up,

00:09:01.640 | the pressure goes up, things like that, right?

00:09:04.600 | For perfect gas, at least.

00:09:06.440 | Those are the things you can know about that system.

00:09:09.720 | And it's a tiny, tiny number of bits

00:09:11.280 | compared to the complete information

00:09:13.640 | of the state of the entire system.

00:09:15.080 | Because the state of the entire system

00:09:16.440 | will give you the position and momentum

00:09:18.000 | of every molecule of the gas.

00:09:20.980 | And what you don't know about it is the entropy,

00:09:25.160 | and you interpret it as heat.

00:09:28.240 | The energy contained in that thing is what we call heat.

00:09:32.280 | Now, it's very possible that, in fact,

00:09:36.480 | there is some very strong structure

00:09:37.920 | in how those molecules are moving.

00:09:39.240 | It's just that they are in a way

00:09:40.680 | that we are just not wired to perceive.

00:09:43.320 | - Yeah, we're ignorant to it.

00:09:44.320 | And there's an infinite amount of things

00:09:48.080 | we're not wired to perceive.

00:09:49.640 | And you're right, that's a nice way to put it.

00:09:51.560 | We're general to all the things we can imagine,

00:09:54.420 | which is a very tiny subset of all things that are possible.

00:09:59.040 | - So it's like Kolmogorov complexity

00:10:00.520 | or the Kolmogorov-Chaitin sum of complexity.

00:10:02.820 | Every bit string or every integer is random,

00:10:09.800 | except for all the ones that you can actually write down.

00:10:12.360 | (both laughing)

00:10:15.800 | - Yeah, okay, so beautifully put.

00:10:17.080 | But so we can just call it artificial intelligence.

00:10:19.640 | We don't need to have a general.

00:10:21.240 | - Or human level.

00:10:23.000 | Human level intelligence is good.

00:10:27.560 | Anytime you touch human, it gets interesting

00:10:30.840 | because we attach ourselves to human

00:10:35.840 | and it's difficult to define what human intelligence is.

00:10:40.440 | Nevertheless, my definition is maybe

00:10:44.160 | damn impressive intelligence.

00:10:47.480 | Okay, damn impressive demonstration of intelligence,

00:10:50.200 | whatever.

00:10:51.040 | (upbeat music)

00:10:53.620 | (upbeat music)

00:10:56.200 | (upbeat music)

00:10:58.780 | (upbeat music)

00:11:01.360 | (upbeat music)

00:11:03.940 | (upbeat music)

00:11:06.520 | [BLANK_AUDIO]

Yann LeCun: Benchmarks for Human-Level Intelligence | AI Podcast Clips

Chapters