back to index

Yann LeCun: Benchmarks for Human-Level Intelligence | AI Podcast Clips


Chapters

0:0 Dont get fooled
0:52 Toy problems
2:45 Interactive environments
5:13 Specialization
6:53 Boolean Functions

Whisper Transcript | Transcript Only Page

00:00:00.000 | (gentle music)
00:00:02.580 | - You've written advice saying,
00:00:09.720 | "Don't get fooled by people who claim
00:00:12.120 | "to have a solution to artificial general intelligence,
00:00:14.740 | "who claim to have an AI system
00:00:16.120 | "that works just like the human brain,
00:00:18.460 | "or who claim to have figured out how the brain works.
00:00:21.240 | "Ask them what the error rate they get
00:00:25.140 | "on MNIST or ImageNet."
00:00:27.360 | - Yeah, this is a little dated, by the way. (laughs)
00:00:29.640 | I mean, five years, who's counting?
00:00:32.520 | Okay, but I think your opinion is still,
00:00:35.120 | MNIST and ImageNet, yes, may be dated,
00:00:39.160 | there may be new benchmarks, right?
00:00:40.600 | But I think that philosophy is one you still
00:00:43.600 | and somewhat hold, that benchmarks
00:00:47.640 | and the practical testing, the practical application
00:00:49.980 | is where you really get to test the ideas.
00:00:52.240 | - Well, it may not be completely practical.
00:00:54.040 | Like, for example, it could be a toy dataset,
00:00:56.680 | but it has to be some sort of task
00:00:59.080 | that the community as a whole has accepted
00:01:01.520 | as some sort of standard kind of benchmark, if you want.
00:01:04.840 | It doesn't need to be real.
00:01:05.680 | So for example, many years ago here at FAIR,
00:01:08.520 | people, Jason West, Antoine Born, and a few others
00:01:11.880 | proposed the Babi tasks, which were kind of a toy problem
00:01:15.380 | to test the ability of machines to reason, actually,
00:01:18.560 | to access working memory and things like this.
00:01:21.180 | And it was very useful, even though it wasn't a real task.
00:01:24.360 | MNIST is kind of halfway a real task.
00:01:27.880 | So, you know, toy problems can be very useful.
00:01:30.280 | It's just that I was really struck by the fact that
00:01:33.800 | a lot of people, particularly a lot of people
00:01:35.400 | with money to invest, would be fooled by people telling them,
00:01:38.640 | oh, we have, you know, the algorithm of the cortex
00:01:41.640 | and you should give us 50 million.
00:01:43.620 | - Yes, absolutely.
00:01:44.460 | So there's a lot of people who try to take advantage
00:01:49.460 | of the hype for business reasons and so on.
00:01:52.480 | But let me sort of talk to this idea
00:01:56.080 | that the new ideas, the ideas that push the field forward
00:02:00.360 | may not yet have a benchmark,
00:02:02.880 | or it may be very difficult to establish a benchmark.
00:02:05.120 | - I agree.
00:02:05.960 | That's part of the process.
00:02:06.800 | Establishing benchmarks is part of the process.
00:02:08.840 | - So what are your thoughts about,
00:02:11.560 | so we have these benchmarks on,
00:02:13.880 | around stuff we can do with images,
00:02:16.520 | from classification to captioning,
00:02:19.160 | to just every kind of information you can pull off
00:02:21.180 | from images and the surface level.
00:02:23.120 | There's audio data sets, there's some video.
00:02:25.680 | What can we start, natural language,
00:02:29.200 | what kind of stuff, what kind of benchmarks do you see
00:02:32.960 | that start creeping on to more something like intelligence,
00:02:37.880 | like reasoning, like, maybe you don't like the term,
00:02:41.680 | but AGI, echoes of that kind of formulation?
00:02:44.720 | - Yeah, so a lot of people are working on
00:02:47.120 | interactive environments in which you can train
00:02:50.360 | and test intelligent systems.
00:02:52.360 | So there, for example,
00:02:54.460 | the classical paradigm of supervised learning
00:03:00.400 | is that you have a data set,
00:03:02.200 | you partition it into a training set,
00:03:03.720 | validation set, test set,
00:03:04.720 | and there's a clear protocol, right?
00:03:07.280 | But what if, that assumes that the samples
00:03:10.660 | are statistically independent,
00:03:13.120 | you can exchange them,
00:03:14.360 | the order in which you see them shouldn't matter,
00:03:16.520 | things like that.
00:03:17.720 | But what if the answer you give
00:03:19.800 | determines the next sample you see,
00:03:21.840 | which is the case, for example, in robotics, right?
00:03:23.840 | You robot does something
00:03:25.440 | and then it gets exposed to a new room,
00:03:27.880 | and depending on where it goes,
00:03:29.380 | the room would be different.
00:03:30.240 | So that creates the exploration problem.
00:03:32.700 | The what if, the samples,
00:03:37.240 | so that creates also a dependency between samples, right?
00:03:40.040 | If you can only move in space,
00:03:43.880 | the next sample you're gonna see
00:03:45.220 | is gonna be probably in the same building, most likely.
00:03:49.760 | So all the assumptions about the validity
00:03:52.200 | of this training set, test set hypothesis break.
00:03:55.920 | Whenever a machine can take an action
00:03:57.360 | that has an influence in the world,
00:03:59.200 | and it's what it's gonna see.
00:04:00.640 | So people are setting up artificial environments
00:04:04.420 | where that takes place, right?
00:04:06.320 | The robot runs around a 3D model of a house
00:04:10.100 | and can interact with objects and things like this.
00:04:12.960 | So you do robotics by simulation,
00:04:14.640 | you have those, you know,
00:04:17.200 | a bunny eye gym type thing,
00:04:18.680 | or a MuJoCo kind of simulated robots,
00:04:23.080 | and you have games, you know, things like that.
00:04:25.520 | So that's where the field is going, really,
00:04:27.900 | this kind of environment.
00:04:29.160 | Now, back to the question of AGI,
00:04:32.600 | like, I don't like the term AGI,
00:04:34.200 | because it implies that human intelligence is general,
00:04:40.040 | and human intelligence is nothing like general,
00:04:42.640 | it's very, very specialized.
00:04:45.120 | We think it's general,
00:04:45.960 | we like to think of ourselves as having general intelligence,
00:04:48.080 | we don't, we're very specialized.
00:04:50.360 | We're only slightly more general than--
00:04:51.800 | - Why does it feel general?
00:04:53.160 | So you kind of, the term general,
00:04:56.300 | I think what's impressive about humans
00:04:58.480 | is the ability to learn,
00:05:00.600 | as we were talking about learning,
00:05:02.520 | to learn in just so many different domains.
00:05:05.520 | It's perhaps not arbitrarily general,
00:05:08.680 | but just you can learn in many domains
00:05:10.680 | and integrate that knowledge somehow.
00:05:12.480 | - Okay. - The knowledge persists.
00:05:14.120 | - So let me take a very specific example.
00:05:16.480 | It's not an example,
00:05:17.320 | it's more like a quasi-mathematical demonstration.
00:05:21.360 | So you have about one million fibers
00:05:22.800 | coming out of one of your eyes,
00:05:24.680 | okay, two million total,
00:05:25.600 | but let's talk about just one of them.
00:05:27.720 | It's one million nerve fibers, your optical nerve.
00:05:30.320 | Let's imagine that they are binary,
00:05:33.040 | so they can be active or inactive, right?
00:05:34.880 | So the input to your visual cortex is one million bits.
00:05:38.300 | Now, they're connected to your brain in a particular way,
00:05:43.640 | and your brain has connections
00:05:46.200 | that are kind of a little bit like a convolutional net,
00:05:48.400 | they're kind of local, you know,
00:05:50.520 | in space and things like this.
00:05:52.200 | Now imagine I play a trick on you.
00:05:53.960 | It's a pretty nasty trick, I admit.
00:05:57.280 | I cut your optical nerve,
00:05:59.960 | and I put a device that makes a random perturbation,
00:06:03.400 | a permutation of all the nerve fibers.
00:06:05.360 | So now what comes to your brain
00:06:08.840 | is a fixed but random permutation of all the pixels.
00:06:13.400 | There's no way in hell that your visual cortex,
00:06:15.620 | even if I do this to you in infancy,
00:06:19.000 | will actually learn vision
00:06:20.760 | to the same level of quality that you can.
00:06:24.280 | - Got it, and you're saying
00:06:25.360 | there's no way you've relearned that?
00:06:26.960 | - No, because now two pixels that are nearby in the world
00:06:29.880 | will end up in very different places in your visual cortex.
00:06:33.480 | And your neurons there have no connections with each other
00:06:35.880 | because they only connected locally.
00:06:37.740 | - So this whole, our entire,
00:06:39.280 | the hardware is built in many ways to support?
00:06:42.860 | - The locality of the real world.
00:06:44.440 | Yes, that's specialization.
00:06:46.840 | - Yeah, but it's still pretty damn impressive.
00:06:48.840 | So it's not perfect generalization.
00:06:50.480 | It's not even close.
00:06:51.320 | - No, no, it's not that it's not even close.
00:06:54.240 | It's not at all.
00:06:55.240 | - Yeah, it's not, it's specialized.
00:06:56.480 | - So how many Boolean functions?
00:06:58.280 | So let's imagine you want to train your visual system
00:07:02.520 | to recognize particular patterns of those one million bits.
00:07:07.520 | Okay, so that's a Boolean function, right?
00:07:10.040 | Either the pattern is here or not here.
00:07:11.320 | It's a two way classification
00:07:13.460 | with one million binary inputs.
00:07:15.940 | How many such Boolean functions are there?
00:07:20.540 | Okay, you have two to the one million
00:07:23.280 | combinations of inputs.
00:07:25.440 | For each of those, you have an output bit.
00:07:28.320 | And so you have two to the one million
00:07:31.080 | Boolean functions of this type, okay?
00:07:34.300 | Which is an unimaginably large number.
00:07:37.280 | How many of those functions
00:07:38.800 | can actually be computed by your visual cortex?
00:07:41.520 | And the answer is a tiny, tiny, tiny, tiny, tiny, tiny sliver
00:07:45.720 | like an enormously tiny sliver.
00:07:47.760 | - Yeah, yeah.
00:07:49.240 | - So we are ridiculously specialized.
00:07:51.560 | - But, okay, that's an argument against the word general.
00:07:58.120 | I think there's a, I agree with your intuition,
00:08:03.400 | but I'm not sure it's,
00:08:05.200 | it seems the brain is impressively
00:08:09.040 | capable of adjusting to things.
00:08:13.880 | - It's because we can't imagine tasks
00:08:17.640 | that are outside of our comprehension, right?
00:08:20.480 | So we think we are general
00:08:22.280 | because we are general of all the things
00:08:23.520 | that we can apprehend.
00:08:25.000 | But there is a huge world out there
00:08:27.240 | of things that we have no idea.
00:08:28.960 | We call that heat, by the way.
00:08:31.080 | - Heat. - Heat.
00:08:31.980 | So, at least physicists call that heat,
00:08:34.880 | or they call it entropy, which is--
00:08:36.040 | - Entropy.
00:08:36.880 | - You have a thing full of gas, right?
00:08:42.800 | - Closed system full of gas.
00:08:44.960 | - Right?
00:08:45.960 | Closed or not closed.
00:08:46.840 | It has pressure, it has temperature,
00:08:51.840 | it has, and you can write equations,
00:08:56.000 | PV equal NRT, things like that, right?
00:08:58.400 | When you reduce the volume, the temperature goes up,
00:09:01.640 | the pressure goes up, things like that, right?
00:09:04.600 | For perfect gas, at least.
00:09:06.440 | Those are the things you can know about that system.
00:09:09.720 | And it's a tiny, tiny number of bits
00:09:11.280 | compared to the complete information
00:09:13.640 | of the state of the entire system.
00:09:15.080 | Because the state of the entire system
00:09:16.440 | will give you the position and momentum
00:09:18.000 | of every molecule of the gas.
00:09:20.980 | And what you don't know about it is the entropy,
00:09:25.160 | and you interpret it as heat.
00:09:28.240 | The energy contained in that thing is what we call heat.
00:09:32.280 | Now, it's very possible that, in fact,
00:09:36.480 | there is some very strong structure
00:09:37.920 | in how those molecules are moving.
00:09:39.240 | It's just that they are in a way
00:09:40.680 | that we are just not wired to perceive.
00:09:43.320 | - Yeah, we're ignorant to it.
00:09:44.320 | And there's an infinite amount of things
00:09:48.080 | we're not wired to perceive.
00:09:49.640 | And you're right, that's a nice way to put it.
00:09:51.560 | We're general to all the things we can imagine,
00:09:54.420 | which is a very tiny subset of all things that are possible.
00:09:59.040 | - So it's like Kolmogorov complexity
00:10:00.520 | or the Kolmogorov-Chaitin sum of complexity.
00:10:02.820 | Every bit string or every integer is random,
00:10:09.800 | except for all the ones that you can actually write down.
00:10:12.360 | (both laughing)
00:10:15.800 | - Yeah, okay, so beautifully put.
00:10:17.080 | But so we can just call it artificial intelligence.
00:10:19.640 | We don't need to have a general.
00:10:21.240 | - Or human level.
00:10:23.000 | Human level intelligence is good.
00:10:27.560 | Anytime you touch human, it gets interesting
00:10:30.840 | because we attach ourselves to human
00:10:35.840 | and it's difficult to define what human intelligence is.
00:10:40.440 | Nevertheless, my definition is maybe
00:10:44.160 | damn impressive intelligence.
00:10:47.480 | Okay, damn impressive demonstration of intelligence,
00:10:50.200 | whatever.
00:10:51.040 | (upbeat music)
00:10:53.620 | (upbeat music)
00:10:56.200 | (upbeat music)
00:10:58.780 | (upbeat music)
00:11:01.360 | (upbeat music)
00:11:03.940 | (upbeat music)
00:11:06.520 | [BLANK_AUDIO]