back to indexVladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence | Lex Fridman Podcast #71
Chapters
0:0 Introduction
2:55 Alan Turing: science and engineering of intelligence
9:9 What is a predicate?
14:22 Plato's world of ideas and world of things
21:6 Strong and weak convergence
28:37 Deep learning and the essence of intelligence
50:36 Symbolic AI and logic-based systems
54:31 How hard is 2D image understanding?
60:23 Data
66:39 Language
74:54 Beautiful idea in statistical theory of learning
79:28 Intelligence and heuristics
82:23 Reasoning
85:11 Role of philosophy in learning theory
91:40 Music (speaking in Russian)
95:8 Mortality
00:00:00.000 |
The following is a conversation with Vladimir Vapnik, 00:00:03.220 |
part two, the second time we spoke on the podcast. 00:00:07.200 |
He's the co-inventor of support vector machines, 00:00:12.120 |
and many foundational ideas in statistical learning. 00:00:17.300 |
worked at the Institute of Control Sciences in Moscow, 00:00:26.120 |
and now is a professor at Columbia University. 00:00:34.880 |
was just over a year ago, one of the early episodes. 00:00:41.460 |
titled "Complete Statistical Theory of Learning" 00:00:44.440 |
as part of the MIT series of lectures on deep learning 00:00:50.200 |
I'll release the video of the lecture in the next few days. 00:00:53.720 |
This podcast and the lecture are independent from each other 00:00:56.840 |
so you don't need one to understand the other. 00:00:59.420 |
The lecture is quite technical and math heavy, 00:01:06.800 |
since the podcast is probably a bit more accessible. 00:01:24.760 |
As usual, I'll do one or two minutes of ads now 00:01:49.880 |
Brokerage services are provided by Cash App Investing, 00:02:00.520 |
and security in all digital transactions is very important, 00:02:03.600 |
let me mention the PCI Data Security Standard, 00:02:06.760 |
PCI DSS Level 1, that Cash App is compliant with. 00:02:11.760 |
I'm a big fan of standards for safety and security, 00:02:21.020 |
and agreed that there needs to be a global standard 00:02:25.800 |
Now, we just need to do the same for autonomous vehicles 00:02:30.480 |
So again, if you get Cash App from the App Store 00:02:37.280 |
you get $10, and Cash App will also donate $10 to FIRST, 00:02:43.120 |
that is helping to advance robotics and STEM education 00:02:49.720 |
And now, here's my conversation with Vladimir Vapnik. 00:02:54.140 |
You and I talked about Alan Turing yesterday a little bit. 00:02:59.780 |
- And that he, as the father of artificial intelligence, 00:03:02.660 |
may have instilled in our field an ethic of engineering, 00:03:05.700 |
and not science, seeking more to build intelligence 00:03:13.300 |
between these two paths of engineering intelligence 00:03:27.620 |
You have to make a device which behave as human behave, 00:03:47.700 |
So I think, I believe, that it's somehow related 00:04:12.660 |
He call it units, which can explain human behavior, 00:04:20.780 |
He look at Russian tales and derive from that. 00:04:29.580 |
It is in TV, in movie serials, and so on and so on. 00:04:40.020 |
Morphology of the Folk Tale, describing 31 predicates 00:04:56.100 |
I'd like to talk about predicates in a focused way. 00:04:59.180 |
But let me, if you'll allow me to stay zoomed out 00:05:03.740 |
And he inspired a generation with the imitation game. 00:05:11.660 |
- Do you think, if we can linger on that a little bit longer, 00:05:15.220 |
do you think we can learn, do you think learning 00:05:25.460 |
So why do you think imitation is so far from understanding? 00:05:37.540 |
So your goal is to create something, something useful. 00:05:49.740 |
and I believe that it will be done even more. 00:05:52.340 |
You have self-driving cars and also this business. 00:06:19.220 |
I believe that intelligence, it is world of ideas. 00:06:25.900 |
And when you combine them with reality things, 00:06:43.580 |
in way to constructing invariant is intelligence. 00:07:00.820 |
For example, 31 predicates for human behavior, 00:07:12.340 |
31 predicates to describe stories, narratives. 00:07:19.380 |
how much of human behavior, how much of our world, 00:07:23.100 |
our universe, all the things that matter in our existence 00:07:32.660 |
- I think that we have a lot of formal behavior. 00:07:41.020 |
Because even in this example, which I gave you yesterday, 00:07:50.260 |
one predicate can construct many different invariants, 00:08:21.900 |
and 2D images a little bit in your challenge. 00:08:30.820 |
no, that hopes to be exactly about the spirit 00:08:34.460 |
of intelligence in the simplest possible way. 00:08:37.540 |
- Yeah, absolutely, you should start the simplest way. 00:08:42.700 |
- Well, there's an open question whether starting 00:08:49.220 |
towards intelligence, or it's an entirely different thing. 00:08:56.820 |
100, 200 times less examples, you need intelligence. 00:09:03.780 |
and it would be nice, I'd like to ask simple, 00:09:11.500 |
In terms of terms and how you think about it, 00:09:36.460 |
At the simplest level, we're not being profound currently. 00:09:40.700 |
A predicate is a statement of something that is true. 00:09:54.700 |
this is truly constraints of logical statements 00:10:00.220 |
- In my definition, the simplest predicate is function. 00:10:10.900 |
- What's the input, and what's the output of the function? 00:10:14.020 |
- Input is x, something which is input in reality. 00:10:50.980 |
as I described with linearity, with all this stuff. 00:10:55.020 |
But another, I believe, I don't know how many, 00:11:06.980 |
- It is formal definition, say something heavy 00:11:11.660 |
on the left corner, not so heavy in the middle, and so on. 00:11:17.060 |
You describe in general concept of what you assume. 00:11:31.600 |
There's a million ways we can talk about this. 00:11:41.560 |
But it's hard to put them, just like you're saying now, 00:11:47.760 |
When critics in music, trying to describe music, 00:11:57.100 |
And not too many predicate, but in different combination. 00:12:02.880 |
But they have some special words for describing music. 00:12:23.640 |
who can summarize the essence of images, human beings? 00:12:42.600 |
do you think there exists a small set of predicates 00:12:51.240 |
that the concept of what makes a two and a three and a four. 00:12:58.920 |
What, it should not describe two, three, four. 00:13:11.960 |
- An invariance, sorry to stick on this, but terminology. 00:13:21.160 |
I can say, looking at my image, it is more or less symmetric. 00:13:51.600 |
exactly in the way how musical critics describe music. 00:13:56.600 |
So, but this is invariant applied to specific data, 00:14:15.000 |
and world of reality and predicate and reality 00:14:18.920 |
is somehow connected and you have to know that. 00:14:24.020 |
So you draw a line from Plato to Hegel to Wigner to today. 00:14:35.560 |
There's a world of ideas and a world of things 00:14:40.440 |
And presumably the world of ideas is very small 00:14:49.840 |
like it's a shadow, the real world is a shadow 00:15:04.840 |
using these invariants because it is projection 00:15:09.320 |
for on specific examples which creates specific features 00:15:15.120 |
- So the essence of intelligence is while only being able 00:15:30.000 |
Intelligent musical critics knows all this world 00:15:56.720 |
of the human psychology, of the human experience 00:16:00.200 |
which seem to almost contradict intelligence and reason. 00:16:05.200 |
Like emotion, like fear, like love, all of those things. 00:16:10.520 |
Are those not connected in any way to the space of ideas? 00:16:18.760 |
I just want to be concentrate on very simple story, 00:16:27.960 |
- So you don't think you have to love and fear death 00:16:37.000 |
It involves a lot of stuff which I never consider. 00:16:49.320 |
to get the records from small number of observations, 00:17:02.760 |
But universal predicate which understand world of images. 00:17:21.640 |
- So like you said, symmetry is an interesting one. 00:17:32.160 |
So you think symmetry at the bottom is a universal notion 00:17:37.160 |
and there's degrees of a single kind of symmetry 00:17:46.040 |
There is a symmetry, anti-symmetry, say, letter S. 00:17:56.340 |
And it could be diagonal symmetry, vertical symmetry. 00:18:18.960 |
But that's just like one example of symmetry, right? 00:18:34.160 |
whatever I describe, you can have a degree of symmetry. 00:18:45.480 |
It is the same as you will describe this image. 00:18:56.920 |
Digits three is symmetric, more or less look for symmetry. 00:19:14.320 |
Or are these independent, distinct predicates 00:19:35.240 |
A degree of symmetry can be zero, no symmetry at all. 00:19:40.720 |
Or degree of symmetry, say, more or less symmetrical. 00:19:56.360 |
and anti-symmetry is also a concept of symmetry. 00:20:03.320 |
I mean, symmetry is a fascinating notion, but-- 00:20:11.400 |
I would like to know predicates for digit recognition. 00:20:19.400 |
- It is not necessarily for digit recognition. 00:20:26.800 |
which you can use when you will have examples 00:20:35.040 |
You have regular problem of digit recognition, 00:20:38.320 |
you have examples of the first class, or second class. 00:20:41.640 |
Plus, you know that there exists concept of symmetry. 00:20:45.880 |
And you apply when you're looking for decision rule, 00:20:55.440 |
of this level of symmetry, which you estimate from. 00:21:00.160 |
So let's talk, everything comes from weak convergence. 00:21:05.160 |
- What is convergence, what is weak convergence, 00:21:15.320 |
- You're converging, you would like to have a function. 00:22:00.240 |
Say symmetry, and I can measure level of symmetry 00:22:08.060 |
And then I can take average from my training data, 00:22:36.460 |
will give me the same average as I observe on training data. 00:22:53.500 |
You show this predicate, show general property 00:23:10.380 |
and you select as admissible set of function, 00:23:20.760 |
So you're immediately looking for smaller subset of function. 00:23:24.940 |
- That's what you mean by admissible functions. 00:23:32.780 |
- It's pretty large, but if you have one predicate. 00:23:36.600 |
But according to, there is a strong and weak convergence. 00:23:41.600 |
Strong convergence is convergence in function. 00:23:45.280 |
You're looking for the function, on one function, 00:23:51.900 |
And square difference from them should be small. 00:24:01.880 |
make a square, make an integral, and it should be small. 00:24:08.040 |
Suppose you have some function, any function. 00:24:16.960 |
If integral from square difference between them is small. 00:24:22.880 |
- That's the definition of strong convergence. 00:24:25.800 |
- Two functions, the integral of the difference is small. 00:24:32.320 |
- But you have different convergence in functionals. 00:24:36.760 |
You take any function, you take some function, phi, 00:24:40.120 |
and take inner product, this function is f function. 00:25:03.080 |
if this value of inner product converge to value f zero. 00:25:12.520 |
But weak convergence requires that it converge 00:25:20.680 |
If it converge for any function of Hilbert space, 00:25:24.280 |
then you will say that this is weak convergence. 00:25:32.240 |
that is property, integral property of function. 00:25:36.000 |
For example, if you will take sine or cosine, 00:25:39.200 |
it is coefficient of, say, Fourier expansion. 00:25:50.560 |
of Fourier expansion, so under some condition, 00:26:08.660 |
So weak convergence means integral property of functions. 00:26:16.120 |
I would like to formulate which integral properties 00:26:52.960 |
which is admissible in the sense that function 00:26:57.680 |
which I looking for in this set of functions. 00:27:12.640 |
- Yeah, but property, you can know independent 00:27:27.280 |
But Russian fairy tale is not so interesting. 00:27:42.000 |
- Well, so I would argue that there's a little bit 00:27:48.560 |
that were applied to which are essentially stories 00:27:55.920 |
You're saying digits, there's a story within the digit. 00:28:22.120 |
wide set of function, but from set of function 00:28:28.080 |
But predicates is not related just to digit recognition. 00:28:37.700 |
- Do you think it's possible to automatically 00:28:42.160 |
So you basically said that the essence of intelligence 00:28:55.200 |
you know, that's what Einstein was good at doing in physics. 00:28:58.200 |
Can we make machines do these kinds of discovery 00:29:11.440 |
Because according to theory about weak convergence, 00:29:16.440 |
any function from Hilbert space can be predicated. 00:29:23.200 |
So you have infinite number of predicate in upper, 00:29:27.680 |
and before you don't know which predicate is good and which. 00:29:32.680 |
But whatever probe show and why people call it breakthrough, 00:29:37.840 |
that there is not too many predicate which cover 00:29:53.200 |
And most of the, only a small amount are useful 00:29:57.800 |
for the kinds of things that happen in the world. 00:30:00.240 |
- I think that I would say only small part of predicate, 00:30:18.160 |
So can we linger on it, what's your intuition, 00:30:21.800 |
why is it hard for a machine to discover good predicates? 00:30:26.800 |
- Even in my talk described how to do predicate. 00:30:36.680 |
- No, in my talk I gave example for diabetes. 00:30:48.440 |
where some sort of predicate, which I formulate, 00:31:07.000 |
I select only function which keeps it invariant. 00:31:11.160 |
And when I did it, I improved my performance. 00:31:34.200 |
because that's the essence, that's the challenge, 00:31:36.320 |
that is artificial, that's the human level intelligence 00:31:40.360 |
that we seek, is the discovery of these good predicates. 00:31:43.840 |
You've talked about deep learning as a way to, 00:31:47.520 |
the predicates they use and the functions are mediocre. 00:31:58.200 |
- I know only Janss-Likun, convolutional network. 00:32:05.280 |
I don't know, and it's a very simple convolution. 00:32:25.480 |
for translation and predicate, this should be kept. 00:32:32.480 |
but humans discovered that one, or at least-- 00:32:39.040 |
And that is big story because Jan did it 25 years ago 00:32:43.760 |
and nothing so clear was added to deep network. 00:32:57.440 |
instead of talking about piecewise linear functions 00:33:07.320 |
that maybe the amount of predicates necessary 00:33:11.160 |
to solve general intelligence, say in the space of images, 00:33:16.160 |
doing efficient recognition of handwritten digits 00:33:22.360 |
And so we shouldn't be so obsessed about finding, 00:33:44.200 |
to learn at which part of the input to look at. 00:33:47.640 |
The thing is, there's other things besides predicates 00:33:51.040 |
that are important for the actual engineering mechanism 00:34:02.120 |
- I mean, that's essentially the work of deep learning 00:34:07.160 |
that are able to be, given the training data, 00:34:14.720 |
a function that can approximate, can generalize well. 00:34:45.360 |
- Large, large number of piecewise linear functions. 00:35:03.000 |
- It's space with infinite number of coordinates, 00:35:07.160 |
say, or function for expansion, something like that. 00:35:13.460 |
So when I'm talking about closed form solution, 00:35:20.840 |
not piecewise linear set which is particular case. 00:35:30.920 |
- So neural networks is a small part of the space 00:35:33.600 |
you're talking about, of functions you're talking about. 00:35:50.080 |
So now when you're trying to create architecture, 00:35:54.360 |
you would like to create admissible set of functions 00:36:06.000 |
Say, when you're introducing convolutional net, 00:36:10.140 |
it is way to make this subset useful for you. 00:36:19.800 |
it is something you want to keep some invariants, 00:36:46.720 |
As I say, this set of functions should be admissible, 00:36:55.260 |
You know that as soon as you incorporate new invariants, 00:36:59.080 |
set of function becomes smaller and smaller and smaller. 00:37:02.160 |
- But all the invariants are specified by you, the human. 00:37:05.540 |
- Yeah, but what I hope that there is a standard predicate, 00:37:14.200 |
that what I want to find for digital recognition. 00:37:34.780 |
But you know, it is amusing that mathematician 00:37:39.800 |
doing something in neural network, in general function, 00:38:08.200 |
who studied theoretical literature, he found that. 00:38:12.280 |
- You know what, let me throw that right back at you, 00:38:25.840 |
- And you just said another emotional statement, 00:38:30.120 |
which is you believe that this Plato world of ideas 00:38:37.880 |
What's your intuition, though, if we can linger on it? 00:38:58.920 |
But my goal to decrease set of function much. 00:39:27.840 |
- So if each good predicate significantly reduces 00:39:32.640 |
that there naturally should not be that many good predicates. 00:39:35.600 |
- No, but if you reduce very well the VC dimension 00:39:40.600 |
of the function, of admissible set of function, 00:39:55.360 |
is some measure of capacity of this set of functions. 00:39:58.600 |
Roughly speaking, how many function in this set. 00:40:18.880 |
but the good predicate, it's such that can do that. 00:40:23.880 |
So for this duck, you should know a little bit about duck, 00:40:31.600 |
- What are the three fundamental laws of ducks? 00:40:38.400 |
- You should know something about ducks to be able to-- 00:40:50.000 |
- And talk like, and make sound like horse, or something. 00:40:58.520 |
It is general predicate that this applied to duck. 00:41:04.640 |
But for duck, you can say, play chess like duck. 00:41:12.680 |
- So you're saying you can, but that would not be a good-- 00:41:15.800 |
- No, you will not reduce a lot of functions. 00:41:18.240 |
- You would not do, yeah, you would not reduce 00:41:25.140 |
mathematical story, is that you can use any function 00:41:30.340 |
But some of them are good, some of them are not, 00:41:33.200 |
because some of them reduce a lot of functions 00:41:39.760 |
- But the question is, and I'll probably keep asking 00:42:01.480 |
Like, guy who understand music can say this word 00:42:06.240 |
which he described when he listened to music. 00:42:11.720 |
He use not too many different, or you can do like prop. 00:42:15.600 |
You can make collection what he talking about music, 00:42:21.040 |
It's not too many different situation he described. 00:42:25.040 |
- Because we mentioned Vladimir Prop a bunch, 00:42:26.960 |
let me just mention, there's a sequence of 31 00:42:30.200 |
structural notions that are common in stories, 00:42:43.600 |
absention, a member of the hero's community or family 00:42:51.040 |
a forbidding edict or command is passed upon the hero, 00:43:07.580 |
Then, reconnaissance, the villain makes an effort 00:43:10.400 |
to attain knowledge, needing to fulfill their plot, 00:43:14.240 |
ends in a wedding, number 31, happily ever after. 00:43:19.240 |
- No, he just gave description of all situation. 00:43:39.960 |
- And probably in our lives, we probably live-- 00:43:45.080 |
At the end, they wrote that this predicate is good 00:43:50.080 |
for different situation, for movie, for theater. 00:44:14.400 |
but it's looking at paradise behind the scenes. 00:44:35.960 |
Doesn't matter how, but they exist, probably. 00:44:46.240 |
whether without our human brains to interpret these units, 00:44:50.360 |
they would still hold as much power as they have. 00:45:14.200 |
- You understand characters, you understand-- 00:45:18.920 |
It's the imitation versus understanding question, 00:45:50.600 |
of what makes a good hand recognition system my own. 00:45:56.440 |
It seems like that's a very powerful predicate. 00:46:13.080 |
thousands of predicates, millions of predicates, 00:46:28.960 |
You're using digits, you're using examples as well. 00:46:47.960 |
You just will have admissible set of functions 00:46:57.120 |
So the trade-off is when you're not using all predicates, 00:47:14.720 |
I'm gonna keep asking the same dumb question, 00:47:19.120 |
To solve the challenge, you kind of propose a challenge 00:47:21.560 |
that says we should be able to get state-of-the-art 00:47:31.520 |
What kind of predicates do you think you'll-- 00:47:46.580 |
- They just need to write function, that's it. 00:47:50.820 |
- But, so can that function be written, I guess, 00:48:01.120 |
learning a particular function, or another mechanism? 00:48:26.380 |
do the reverse step of helping you find a function. 00:48:33.600 |
is to find a disentangled representation, for example, 00:48:38.180 |
that they call, is to find that one predicate function 00:48:46.860 |
but one very useful essence of this particular visual space. 00:48:54.060 |
Listen, I'm grasping, hoping there's an automated way 00:49:18.420 |
which you're suggesting, don't create invariant. 00:49:28.820 |
Find situation where existing theory cannot explain it. 00:49:42.780 |
Find contradiction, and then remove this contradiction. 00:49:48.940 |
you find function, which, if you will use this function, 00:49:55.060 |
- So it's really the process of discovering contradictions. 00:50:15.520 |
Then include this predicate, making invariants, 00:50:22.120 |
But it is not the best way, probably, I don't know, 00:50:44.680 |
There's what, in the '80s, with expert systems, 00:51:14.400 |
You know, that when smart people sit in a room 00:51:17.600 |
and reason through things, it seems compelling. 00:51:20.360 |
And making our machines do the same is also compelling. 00:51:38.600 |
You have invariants, and you can choose the function you want. 00:51:53.000 |
So, and how from infinite number of function, 00:51:59.940 |
to select finite number, and hopefully small number 00:52:11.080 |
to extract small set of admissible functions. 00:52:19.800 |
because every function just decrease set of function 00:52:27.720 |
- But why do you think logic-based systems can't help? 00:52:44.280 |
And he tried to put in invariant his understanding. 00:53:21.820 |
it's fear of death, it's love, it's spirituality, 00:53:30.820 |
All of it is tied up into understanding gravity, 00:53:59.260 |
If I will have a student concentrate on this work, 00:54:06.860 |
Yeah, it's a beautifully simple, elegant, and yet-- 00:54:10.780 |
- I think that I know invariants which will solve this. 00:54:20.940 |
I want some universal invariants which are good 00:54:25.060 |
not only for digital recognition, for image understanding. 00:54:37.100 |
So if we can kind of intuit handwritten recognition, 00:54:43.820 |
how big of a step, leap, journey is it from that? 00:54:48.820 |
If I gave you good, if I solved your challenge 00:54:56.520 |
to understanding more general natural images? 00:55:07.740 |
As soon as you will create several invariants 00:55:13.020 |
which will help you to get the same performance 00:55:22.820 |
using 100 times, maybe more than 100 times less examples, 00:55:35.220 |
Because you should put some idea how to do that. 00:55:50.440 |
And it seems like how much complicated is the fact 00:55:58.020 |
of a three-dimensional world onto a 2D plane. 00:56:22.940 |
You're talking that there are different invariants. 00:56:36.340 |
- Well, yeah, but you said that it would be-- 00:56:44.820 |
But I'm sure that I don't understand everything there. 00:56:49.300 |
- It's like in staying, say, do as simple as possible, 00:56:58.980 |
- Yeah, but never, that's the difference between you and I. 00:57:12.380 |
without having solved handwritten recognition 00:57:25.680 |
Because ultimately, while the science of intelligence 00:57:31.700 |
how that maps to the engineering of intelligence. 00:57:39.340 |
doesn't help you, it might, it may not help you 00:57:47.400 |
It'll help you a little bit, we don't know how much. 00:58:03.100 |
I start with very general problem, with Plato. 00:58:14.500 |
- So you basically took Plato and the world of forms 00:58:19.140 |
and ideas and mapped and projected into the clearest, 00:58:26.820 |
- You know, I would say that I did not understand Plato 00:58:31.540 |
until recently, and until I consider weak convergence 00:58:36.540 |
and then predicate and then, oh, this is what Plato taught. 00:58:47.120 |
Like why, how do you think about this world of ideas 00:59:00.540 |
- But it is a way how you should try to understand 00:59:17.540 |
Say, Plato's and Hegel, whatever reasonable it exists, 00:59:26.700 |
I don't know what he have in mind, reasonable. 00:59:42.440 |
And then it comes suddenly to Vladimir Propp. 00:59:47.100 |
Look, 31 ideas, 31 units, and describes everything. 00:59:52.880 |
- There's abstractions, ideas that represent our world. 01:00:03.320 |
- Yeah, but you should make a projection on reality. 01:00:42.280 |
like you said, teacher, small examples of data 01:00:55.540 |
that the selection of data may be a powerful journey, 01:01:00.300 |
a useful, you know, coming up with a mechanism 01:01:06.460 |
Do you find this idea of finding the right data set 01:01:13.960 |
Or do you kind of take the data set as a given? 01:01:16.680 |
- I think that it is, you know, my scheme is very simple. 01:01:25.880 |
If you will apply, and you have not too many data, 01:01:30.880 |
if you pick up function which describes this data, 01:01:42.240 |
- Yeah, you will overfit, it will be overfitting. 01:01:53.660 |
So you should go somehow to admissible set of function. 01:02:19.400 |
which you will measure property of your function. 01:02:55.620 |
You need decrease, you need admissible set of function. 01:03:00.160 |
But what, say you have more data than functions. 01:03:12.140 |
But what, I was trying to be poetic for a second. 01:03:38.760 |
Like the optimization should be in the space of functions. 01:03:48.140 |
- No, you know, even from the classical basis theory, 01:04:17.820 |
Small set of function which contain function by looking for. 01:04:27.620 |
- Yeah, but that is another story, I don't touch it. 01:04:43.380 |
The story is that in statistics there are two law. 01:04:46.740 |
Law of large numbers and uniform law of large numbers. 01:04:55.060 |
of large numbers but not uniform law of large numbers. 01:04:58.260 |
- Right, so 60 is law of large, it's large enough. 01:05:05.580 |
some bounds, so it's, but the idea is the following. 01:05:10.060 |
If you trust that, say, this average gives you 01:05:21.020 |
so you can talk about that, about this predicate. 01:05:29.800 |
- Good predicates is the, the discovery of good predicates 01:05:34.580 |
- No, no, it is discovery of your understanding world. 01:05:59.900 |
Then you will apply them to reality, to your data. 01:06:09.660 |
But predicate are not related specifically to your task, 01:06:23.260 |
- Many tasks that you might be interested in. 01:06:35.680 |
It was for fairy tales, but it's happened everywhere. 01:06:38.540 |
- Okay, so we talked about images a little bit, 01:06:42.180 |
but can we talk about Noam Chomsky for a second? 01:06:58.260 |
- Well let me just say, do you think language, 01:07:01.020 |
human language, is essential to expressing ideas, 01:07:14.940 |
- For me, language, and all the story of language, 01:07:20.740 |
I don't understand this, and I'm not, I thought about-- 01:07:26.560 |
- I'm not ready to work on that, because it's so huge. 01:07:30.780 |
It is not for me, and I believe not for our century. 01:07:45.100 |
- So you think, okay, you think digital recognition, 01:07:49.260 |
2D image, how would you more abstractly define 01:07:56.460 |
It's 2D image, symbol recognition, essentially? 01:08:33.420 |
Take the simplest problem, as simple as possible, 01:08:50.360 |
When you will do this, you will find some predicate, 01:09:04.140 |
but that doesn't help you with quantum mechanics. 01:09:17.540 |
whether handwritten recognition is like general relativity, 01:09:23.140 |
so you're still gonna have to do a lot of mess 01:09:31.940 |
so what's your intuition why handwritten recognition 01:09:40.900 |
Just, I think a lot of people would agree with that, 01:09:45.300 |
but if you could elucidate sort of the intuition of why. 01:09:51.780 |
- I don't, no, no, I don't think in this direction. 01:09:56.460 |
I just think in the direction that this is problem, 01:10:05.140 |
we will create some abstract understanding of images. 01:10:19.700 |
I would like to talk to guys who doing Unreal images 01:10:35.140 |
I still, symmetry will play a role in real life images, 01:10:55.940 |
- So the people I know in vision science, for example, 01:11:11.560 |
As far as I know, not much predicate type of thinking 01:11:19.420 |
- They don't, yeah, but how do you even begin 01:11:27.740 |
- Because if we will be able to show that it is worth working 01:11:40.340 |
- So the unfortunate, so if we compare to language, 01:11:43.340 |
language is like letters, a finite set of letters 01:11:46.520 |
and a finite set of ways you can put together those letters, 01:11:50.500 |
so it feels more amenable to kind of analysis. 01:11:53.720 |
With natural images, there is so many pixels. 01:12:08.020 |
It's not just understanding of very simple class of tasks. 01:12:14.020 |
I would like to see lists of tasks with language involved. 01:12:19.020 |
- Yes, so there's a lot of nice benchmarks now 01:12:23.220 |
in natural language processing from the very trivial, 01:12:26.480 |
like understanding the elements of a sentence 01:12:30.180 |
to question answering to much more complicated 01:12:36.100 |
The natural question is with handwritten recognition, 01:12:44.600 |
- Right, but even our records show that we go 01:12:49.600 |
in the wrong direction because we need 60,000 digits. 01:12:56.580 |
- So even this first step, so forget about talking 01:12:59.660 |
about the full journey, this first step should be taken 01:13:07.180 |
- No, I'm saying it should be taken in the right direction 01:13:13.660 |
- If you can talk, it's great, we have half percent of error. 01:13:18.480 |
- And hopefully the step from doing hand recognition 01:13:22.760 |
using very few examples, the step towards what babies do 01:13:26.840 |
when they crawl and understand their physical environment. 01:13:48.360 |
That means that you will use weak convergence, 01:13:54.480 |
- Do you think these principles will naturally 01:14:07.600 |
Or are they going to be very kind of abstract 01:14:14.440 |
- For example, I talked yesterday about symmetry. 01:14:25.760 |
- Yes, for different symmetries and you have for-- 01:14:29.560 |
- Degree of symmetry, that is important, not just symmetry. 01:14:40.240 |
- No, it's not for handwritten, it's for images. 01:14:50.920 |
So a lot of the things we've been talking about falls, 01:14:59.800 |
we've been talking about philosophy a little bit, 01:15:08.080 |
a universal idea of statistical theory of learning. 01:15:10.740 |
What is the most beautiful and sort of powerful 01:15:20.800 |
in the world of statistics or statistic theory of learning? 01:15:38.980 |
So for any function, expectation of function, 01:16:04.960 |
uniform convergence, just convergence is not enough. 01:16:08.540 |
Because when you pick up one which gives minimum, 01:16:15.680 |
you can pick up one function which does not converge 01:16:21.660 |
and it will give you the best answer for this function. 01:16:28.020 |
So you need uniform convergence to guarantee learning. 01:16:34.920 |
So learning does not rely on trivial law of large numbers, 01:16:56.860 |
as I think about myself, how stupid I was 50 years, 01:17:10.960 |
But now I think that most powerful is weak convergence. 01:17:15.280 |
Because it makes admissible set of functions. 01:17:34.600 |
But when we're trying to create artificial intelligence, 01:17:48.780 |
- So reducing the set of admissible functions, 01:17:57.720 |
understanding the properties of weak convergence? 01:18:19.600 |
And it so happened, when we use Hilbert space, 01:18:27.800 |
which is very rich space, space of continuous functions, 01:18:36.880 |
So we can apply weak and strong convergence for learning 01:19:09.760 |
that heuristics are a mess that should be removed 01:19:20.840 |
right instrument, you have closed form solution. 01:19:26.280 |
- Do you think intelligence, human level intelligence, 01:19:53.880 |
I thinking what is the most appropriate kernel 01:20:09.720 |
But looking on the bone, I think that I start to understand 01:20:28.440 |
So I'm again trying to understand what type of kernel 01:20:37.560 |
not an approximation, best fit to this bones. 01:20:45.600 |
that could be done in discovering better function 01:20:52.800 |
- It still comes from, you're looking to mass 01:21:03.920 |
- Then I trying to understand what will be good for that. 01:21:14.000 |
again, maybe I'm a descendant of valentorian, 01:21:43.800 |
But by the way, it is the same story about predicate. 01:21:53.880 |
situation is much more than you have rule for that. 01:22:39.640 |
There's an element of sequentially disassembling, 01:22:48.420 |
So when you think of handwritten recognition, 01:23:01.080 |
- What do you think about sort of the idea of recurrence, 01:23:04.440 |
of going back to memory and thinking through this sort of 01:23:07.480 |
sequentially mangling the different representations 01:23:12.480 |
over and over until you arrive at a conclusion? 01:23:17.940 |
Or is ultimately all that can be wrapped up into a function? 01:23:22.940 |
- You're suggesting that let us use this type of algorithm. 01:23:31.060 |
I first of all starting to understand what I want. 01:23:45.020 |
And when I do that, I think I have to solve this problem. 01:24:08.700 |
- Do you try to, it's the imitation question, right? 01:24:19.620 |
does that inspire in you a thought that we need to add that 01:24:29.000 |
You're saying, okay, I mean, you've kind of answered saying 01:24:44.980 |
In reasoning, in human, it is for me too complicated. 01:24:50.860 |
For me, the most difficult part is to ask questions, 01:25:13.640 |
about technical things, speaking of questions, 01:25:18.220 |
So what role does philosophy play in machine learning? 01:25:23.500 |
We talked about Plato, but generally thinking 01:25:39.500 |
It's like predicate, like say admissible set of functions. 01:25:56.480 |
was done 50 years ago, it all that, this is theory. 01:26:02.240 |
If you have data, you can, and your set of function 01:26:15.760 |
You can make structural risk minimization, control capacity. 01:26:27.980 |
Now, when suddenly realize that we did not use 01:26:48.020 |
that we should be swimming in the space of ideas. 01:26:56.860 |
So understanding of life, say people like Plato, 01:27:03.500 |
they understand on very high abstract level of life. 01:27:06.800 |
So, and whatever I doing, it just implementation 01:27:36.580 |
- So that required thinking about life a little bit. 01:27:43.260 |
Hard to trace, but there was some thought process. 01:27:48.860 |
- You know, I working, I thinking about the same problem 01:28:00.020 |
I trying to be honest and that is very important. 01:28:13.360 |
And now I understand that because I believe in math, 01:28:23.740 |
But now when I see that there are only two way 01:28:32.060 |
that means that we must do as well as people doing. 01:28:37.940 |
But now exactly in philosophy and what we know 01:28:50.100 |
I thought about that and that is more or less obvious. 01:28:59.020 |
But next, I have a feeling it's something about structures. 01:29:11.820 |
how to measure measure of structure and all that stuff. 01:29:16.180 |
And the guy who will solve this challenge problem, 01:29:34.180 |
- Oh yeah, absolutely, symmetry will be there. 01:29:43.020 |
diagonal, vertical, I even don't know how you can use 01:29:54.940 |
I think that people are very sensitive to idea of symmetry. 01:30:07.020 |
But you cannot learn just thinking about that. 01:30:11.820 |
You should do challenging problems and then analyze them, 01:30:36.340 |
Is people describe in language strong convergence 01:30:50.100 |
and story like that, when you will explain to kid, 01:30:59.420 |
But when you try to formalize, you're just ignoring this. 01:31:05.820 |
Why, why 50 years from start of machine learning? 01:31:23.660 |
empirical risk minimization, and all this stuff. 01:31:27.740 |
If you read now textbooks, they just about bound 01:31:34.380 |
They don't look for another problem like admissible set. 01:31:45.060 |
perhaps we, you could talk in Russian for a little bit. 01:31:58.540 |
- How about, can you try to answer in Russian? 01:33:42.860 |
And then Chopin, it is very different vocabulary, 01:33:55.140 |
And I think that if you will make collection of that, 01:34:02.700 |
so maybe from this you can describe predicates 01:34:12.500 |
- From the critic interpretation of the music, yeah. 01:34:18.620 |
what they use, they describe high-level ideas 01:34:29.700 |
So art is not self-explanatory in some sense. 01:34:40.980 |
When you go from ideas to the representation, 01:34:51.420 |
But nevertheless, I believe that when you're looking 01:34:55.860 |
from that, even from art, you will be able to find 01:35:02.060 |
- That's such a fascinating and powerful notion. 01:35:29.660 |
It is pity that I will not be able to do something 01:35:38.740 |
For example, I will be very happy to work with guys, 01:35:44.460 |
theoretician from music, to write this collection 01:35:56.940 |
And from art as well, then take what is in common 01:36:26.460 |
- Well, see, you've got the patient mathematicians mind. 01:36:30.900 |
I think it could be done very quickly and very beautifully. 01:36:41.900 |
this collection, to understand what is common 01:36:46.260 |
to think about that once again and again and again. 01:36:49.500 |
- Again and again and again, but I think sometimes, 01:37:08.740 |
I think there'll be sparks of ideas that'll come. 01:37:21.580 |
So I have friend who was specialist in Russian poetry. 01:37:35.260 |
He did not write poems, but she know a lot of stuff. 01:37:40.260 |
She make book, several books, and one of them 01:38:08.500 |
and we get 100 digits, or maybe less than 100. 01:38:25.260 |
using only words of images of Russian poetry. 01:38:37.580 |
I call it learning using privileged information. 01:38:53.060 |
and another language poetic description of this image. 01:39:04.500 |
using privileged information, you're doing better. 01:39:19.020 |
The collection of digits in poetic descriptions 01:39:27.260 |
- So there's something there in that poetic description. 01:39:32.900 |
- But I think that there is an abstract ideas 01:39:40.700 |
- Yeah, that they're there, that could be discovered. 01:39:45.060 |
- As soon as we start this challenge problem. 01:39:51.180 |
- It immediately connected to all this stuff. 01:39:55.420 |
- Especially with your talk and this podcast, 01:40:00.100 |
It's such a clean, beautiful Einstein-like formulation 01:40:16.660 |
What's the predicate for mysterious existence 01:40:43.100 |
They are writing pictures, they're thinking about 01:41:18.220 |
- So the purpose of life is to create two paths. 01:42:12.620 |
He knows this predicate, he knows big blocks of life. 01:42:39.180 |
most of them are guys who study English language 01:43:15.340 |
- It amazes me that you are and continue to be humbled 01:43:44.860 |
- Let's talk again when your challenge is taken on 01:44:05.540 |
And thank you to our presenting sponsor, Cash App. 01:44:14.340 |
an organization that inspires and educates young minds 01:44:17.060 |
to become science and technology innovators of tomorrow. 01:44:20.740 |
If you enjoy this podcast, subscribe on YouTube, 01:44:25.340 |
support on Patreon, or simply connect with me on Twitter 01:44:37.740 |
do not solve a more general problem as an intermediate step.