back to indexVladimir Vapnik: Statistical Learning | Lex Fridman Podcast #5
Chapters
0:0 Introduction
1:4 God doesnt play dice
4:8 Is math a poetry
7:25 Human intuition
8:44 The role of imagination
9:59 The role of interpretation
12:48 The nature of information
15:58 The English proverb
18:17 A admissible set of functions
20:1 The task of learning
21:5 The process of learning
22:56 Deep learning as neural networks
27:27 The beauty of deep learning
30:34 Can machines think
33:7 Complexity
35:54 Edges
36:53 Learning in the world
39:39 Learning absolute
39:57 Line of work
40:45 Open problem
43:48 Invariance
46:16 The problem of intelligence
48:48 Poetry and music
50:37 Happiest moments
51:52 The possibility of discovery
00:00:00.000 |
The following is a conversation with Vladimir Vapnik. 00:00:03.000 |
He's the co-inventor of the support vector machine, 00:00:07.880 |
and many foundational ideas in statistical learning. 00:00:13.080 |
and worked at the Institute of Controlled Sciences in Moscow. 00:00:18.000 |
he worked at AT&T, NEC Labs, Facebook Research, 00:00:22.240 |
and now is a professor at Columbia University. 00:00:31.840 |
about artificial intelligence and the nature of learning, 00:00:34.760 |
especially on the limits of our current approaches 00:00:49.560 |
or rate it on iTunes or your podcast provider of choice, 00:00:55.280 |
or other social networks at Lex Friedman, spelled F-R-I-D. 00:01:00.160 |
And now, here's my conversation with Vladimir Vapnik. 00:01:03.760 |
Einstein famously said that God doesn't play dice. 00:01:09.960 |
- You have studied the world through the eyes of statistics, 00:01:12.840 |
so let me ask you in terms of the nature of reality, 00:01:17.320 |
fundamental nature of reality, does God play dice? 00:01:28.200 |
which could be important, it looks like God plays dice. 00:01:38.040 |
In philosophy, they distinguish between two positions, 00:01:51.000 |
where you're trying to understand what God did. 00:01:54.680 |
- Can you describe instrumentalism and realism a little bit? 00:01:58.440 |
- For example, if you have some mechanical laws, 00:02:50.080 |
So you're trying to really understand God's thought. 00:02:55.080 |
- So the way you see the world is as an instrumentalist? 00:03:27.160 |
because they say the goal of machine learning 00:03:36.640 |
That is true, but it is instrument for prediction. 00:03:56.000 |
what is probability for another given situation? 00:04:04.320 |
But for understanding, I need conditional probability. 00:04:08.520 |
- So let me just step back a little bit first 00:04:10.640 |
to talk about, you mentioned, which I read last night, 00:04:14.000 |
the parts of the 1960 paper by Eugene Wigner, 00:04:50.400 |
How do you see the role of math in your life? 00:05:01.420 |
- Some people say that math is language which use God. 00:05:20.560 |
about effectiveness, unreasonable effectiveness of math, 00:05:27.820 |
is that if you're looking at mathematical structures, 00:05:37.720 |
And the most scientists from natural science, 00:05:42.480 |
they're looking on equation and trying to understand reality. 00:05:50.080 |
If you try, very carefully look on all equations 00:06:08.160 |
- So math can reveal the simple underlying principles 00:06:19.120 |
But then when you discover them and look at them, 00:06:26.800 |
And it is surprising why people did not see that before. 00:06:33.600 |
You're looking on equation and derive it from equations. 00:06:37.520 |
For example, I talked yesterday about least square method. 00:06:48.160 |
But if you're going step by step by solving some equations, 00:06:52.400 |
you suddenly will get some term which, after thinking, 00:07:04.360 |
In least square method, we throw out a lot of information. 00:07:08.240 |
We don't look in composition of point of observations, 00:07:14.600 |
But when you understood that, that's very simple idea, 00:07:25.680 |
- So some simple algebra, a few steps will take you 00:07:28.880 |
to something surprising that when you think about-- 00:07:53.560 |
But what about human, as opposed to intuition, 00:08:03.080 |
Do you have to be so hard on human intuition? 00:08:09.480 |
Are there moments of brilliance in human intuition 00:08:34.980 |
but axiom polished during generations of scientists. 00:08:51.000 |
when you think of Einstein and special relativity, 00:08:56.840 |
what is the role of imagination coming first there 00:09:44.760 |
It is just interpretation, it is just fantasy, 00:09:53.920 |
to derive, say, main principle of machine learning. 00:09:58.920 |
- When you think about learning and intelligence, 00:10:07.620 |
that is something like what happens in the human brain. 00:10:45.500 |
But he wrote report in London Academy of Science. 00:11:08.180 |
and he imagined that it is army fighting each other. 00:11:15.940 |
And he sent this report in Academy of Science. 00:11:19.860 |
They very carefully look because they believe 00:11:28.340 |
And I believe the same can happen with brain. 00:11:48.100 |
than thousand days of diligent studies one day 00:11:54.040 |
But if I will ask you what teacher does, nobody knows. 00:12:12.140 |
- So what, from a mathematical point of view, 00:12:18.140 |
- No, no, no, but we can say what teacher can do. 00:12:41.280 |
But he knows that when you're using invariant, 00:12:43.540 |
he can decrease number of observations 100 times. 00:12:47.380 |
- So but, maybe try to pull that apart a little bit. 00:12:53.020 |
I think you mentioned like a piano teacher saying 00:13:00.460 |
- I played piano, I played guitar for a long time. 00:13:02.420 |
And yeah, that's, there's, maybe it's romantic, poetic, 00:13:09.820 |
but it feels like there's a lot of truth in that statement. 00:13:12.600 |
Like there is a lot of instruction in that statement. 00:13:19.840 |
The language itself may not contain this information. 00:13:39.820 |
What is the representation of that information? 00:13:41.960 |
- I believe that it is sort of predicate, but I don't know. 00:13:50.400 |
- Because the rest is just mathematical technique. 00:13:57.960 |
is that there is two type, two mechanism of learning. 00:14:11.240 |
In weak convergence mechanism, you can use predicate. 00:14:31.720 |
and quack like a duck, then it is probably duck. 00:14:42.840 |
So you saw many ducks that you're training data. 00:14:56.520 |
- Yeah, the visual characteristics of a duck, yeah. 00:15:04.260 |
So you would like to the theoretical description 00:15:07.920 |
from model coincide with empirical description 00:15:28.920 |
And it is completely legal predicate, but it is useless. 00:15:34.740 |
So half teacher can recognize not useless predicate. 00:15:55.620 |
Looks like a duck, swims like a duck, and quack like a duck. 00:15:59.140 |
- So you can't deny the fact that swims like a duck 00:16:02.100 |
and quacks like a duck has humor in it, has ambiguity. 00:16:32.460 |
- So underneath, in order for us to understand 00:16:35.620 |
swims like a duck, it feels like we need to know 00:16:39.140 |
millions of other little pieces of information 00:16:45.180 |
There doesn't need to be this knowledge base. 00:16:48.140 |
In those statements carries some rich information 00:16:52.660 |
that helps us understand the essence of duck. 00:17:07.400 |
So what it does, you have a lot of functions. 00:17:11.180 |
And then you're talking, it looks like a duck. 00:17:30.220 |
Then you remove all functions which does not look 00:17:35.200 |
like you think it should look from training data. 00:17:51.880 |
And after that you pick up the best function you can. 00:18:15.840 |
- So you talk about admissible set of functions 00:18:24.360 |
- So admissible set of function is set of function 00:18:42.560 |
So how would you describe to a layperson what VC theory is? 00:19:06.600 |
So it contain all continuous functions and it's useless. 00:19:11.720 |
You don't have so many examples to pick up function. 00:19:27.200 |
It's infinite set of function, but not very diverse. 00:19:40.480 |
So the goal is to create admissible set of functions 00:19:53.200 |
Then you should, you will be able to pick up the function 00:20:13.120 |
And then you've figured out a clever way of picking up. 00:20:35.520 |
the set of function, admissible set of function is given. 00:20:58.800 |
small VC dimension, and contain good function. 00:21:17.280 |
- Yeah, you're looking of properties of training data. 00:21:22.440 |
And properties means that you have some function 00:21:46.720 |
So the problem is about how to pick up functions. 00:22:16.680 |
But you know something, which question to ask. 00:22:24.960 |
But looks like a duck at this general situation. 00:22:50.040 |
And that is intelligence part of all this business. 00:23:02.960 |
as neural networks, these arbitrary architectures 00:23:15.080 |
What are the weaknesses and what are the possible strengths? 00:23:22.600 |
Everything, which like deep learning, like features. 00:23:32.400 |
One of the greatest book, this Churchill book 00:23:41.800 |
describing that in old time, when war is over, 00:24:17.360 |
And it was clear for everybody that it is not peace. 00:24:32.160 |
There are mathematicians who are looking for the problem 00:24:35.640 |
from very deep point of view, mathematical point of view. 00:24:48.960 |
And they invented a lot of blah, blah, blah interpretations 00:25:02.520 |
If you like to say, piecewise linear function, 00:25:06.680 |
And do it in class of piecewise linear function. 00:25:22.200 |
And when it not enough, they appeal to brain, 00:25:39.440 |
Try to understand that there is not only one way 00:25:42.960 |
of convergence, which is strong way of convergence. 00:25:52.800 |
you will see that you don't need deep learning. 00:26:03.880 |
It says that optimal solution of mathematical problem, 00:26:08.880 |
which is described learning, is on shadow network. 00:26:39.200 |
where you said throwing something in the bucket 00:26:41.400 |
or the biological example and looking at kings and queens 00:26:50.720 |
or kings and queens and using that as inspiration 00:26:55.480 |
and imagination for where the math will eventually lead you. 00:26:59.100 |
You think that interpretation basically deceives you 00:27:13.920 |
and especially discussion about deep learning, 00:27:21.020 |
not about things, about what you can say about things. 00:27:43.960 |
to find our silly interpretations in these constructs. 00:27:57.040 |
How do you feel about the success of a system 00:28:02.140 |
Using neural networks to estimate the quality of a board 00:28:11.000 |
- That is your interpretation, quality of the board. 00:28:25.320 |
that we don't, I think, mathematically understand that well, 00:28:31.520 |
- That means that it's not a very difficult problem. 00:28:35.120 |
So, you empirically, we empirically have discovered 00:28:55.160 |
it is not the most effective way of learning theory. 00:29:22.920 |
with deep net, using 100 times less training data. 00:29:27.920 |
Even more, some problems deep learning cannot solve, 00:29:54.840 |
But it is possible to create admissible set of functions, 00:30:12.120 |
When you're making training in existing algorithm, 00:30:36.500 |
the swims like a duck and quacks like a duck. 00:30:48.500 |
Clearly, that has evolved in a non-mathematical way. 00:31:02.760 |
and place it in our brain of admissible functions. 00:31:18.800 |
but can you briefly entertain this useless question? 00:31:25.600 |
So, talk about intelligence and your view of it. 00:31:41.660 |
And he understands that it is not thinking computer. 00:31:53.720 |
So, now we understand that the problem not in imitation. 00:31:58.160 |
I'm not sure that intelligence just inside of us. 00:32:28.880 |
In the history of science, it's happened all the time. 00:32:56.100 |
they develop something in general which affect everybody. 00:33:28.320 |
- On the flip side of that, maybe you can comment 00:33:38.140 |
by worst case running time in relation to their input. 00:34:09.160 |
They see theories, they did not know statistical learning. 00:34:16.880 |
our monographs, but in America they did not know. 00:34:21.520 |
And somebody told me that it is worst case theory, 00:34:34.040 |
You can do only what you can do using mathematics. 00:34:38.440 |
And which has clear understanding, and clear description. 00:34:43.440 |
And for this reason, we introduce complexity. 00:34:57.120 |
actually it is diversity, I like this one more. 00:35:01.640 |
You see dimension, you can prove some theorems. 00:35:35.040 |
Because it is not so easy to get good bound, exact bound. 00:35:40.040 |
It's not many cases where you have the bound is not exact. 00:35:49.080 |
But interesting principles which discover the mass. 00:35:54.080 |
- Do you think it's interesting because it's challenging 00:36:09.900 |
So it's like me judging your life as a human being 00:36:15.640 |
by the worst thing you did and the best thing you did, 00:36:25.480 |
- I don't think so because you cannot describe 00:36:41.880 |
But you cannot describe model for every new case. 00:36:46.600 |
So you will be never accurate when you're using model. 00:37:01.960 |
don't you think that the real world has a very long tail? 00:37:06.960 |
That the edge cases are very far away from the mean? 00:37:53.480 |
between uniform law of large numbers and large numbers. 00:37:56.680 |
- Is it useful to describe that a little more? 00:38:01.440 |
- No, for example, when I'm talking about duck, 00:38:07.920 |
But if you will try to do formal, distinguish, 00:38:19.680 |
- So that means that information about looks like a duck 00:38:29.720 |
So we don't know that, how much bit of information 00:38:48.080 |
I don't like how people consider artificial intelligence. 00:39:15.840 |
How people try to, how people can to develop, 00:39:25.560 |
or play like butterfly or something like that. 00:39:41.360 |
And you see that connected to the problem of learning? 00:39:56.680 |
- So what is the line of work, would you say? 00:40:00.360 |
If you were to formulate as a set of open problems, 00:40:04.200 |
that will take us there, to play like a butterfly, 00:40:14.360 |
One mathematical story, that if you have predicate, 00:40:26.440 |
and people even did not start understanding intelligence. 00:40:31.960 |
Because to understand intelligence, first of all, 00:40:44.160 |
- Yeah, so you think we really even haven't started 00:40:51.960 |
We even don't understand that this problem exists. 00:41:02.480 |
I want to understand why one teacher better than another. 00:41:23.680 |
- Yeah, that's a beautiful, so it is a formulation 00:42:02.440 |
And jump like a dog carries zero information. 00:42:30.240 |
At least I can show that there are no another mechanism, 00:43:07.000 |
So you have, say, NIST, digital recognition problem. 00:43:12.120 |
And deep learning claims that they did it very well. 00:43:31.400 |
What it means, you know, digit one, two, three. 00:43:43.120 |
or say, 100 times less examples to do the same job. 00:43:55.080 |
but that last slide was a powerful open challenge 00:44:01.920 |
- Yeah, that is exact problem of intelligence. 00:44:16.960 |
that we use much more training data than humans needed. 00:44:43.400 |
Maybe you will never collect some number of observations. 00:45:02.800 |
we can do a good job with small amount of observations. 00:45:30.120 |
say more than say digit two or something like that. 00:45:34.480 |
But as soon as I get the idea of horizontal symmetry, 00:45:43.800 |
Or even vertical symmetry, or diagonal symmetry, whatever. 00:45:50.880 |
Looking on digit, I see that it is meta-predicate. 00:46:07.360 |
like how dark is whole picture, something like that. 00:46:31.680 |
to understand the difference between a two and a three, 00:46:49.600 |
All of that, walking, jumping, looking at ducks. 00:46:58.520 |
the right predicate for telling the difference 00:47:03.080 |
Or do you think there's a more efficient way? 00:47:14.600 |
- Yeah, but maybe there are several languages 00:47:38.800 |
So in one of our articles, it is trivial to show 00:47:43.040 |
that every example can carry not more than one bit 00:47:57.480 |
you can remove, say, a function which does not tell you one. 00:48:14.800 |
you can remove much more functions than half. 00:48:17.920 |
And that means that it contains a lot of bit of information 00:48:39.240 |
And that predicate carry a lot of information. 00:48:52.520 |
in your work, which is some of the most profound 00:48:56.040 |
mathematical work in the field of learning AI 00:49:04.080 |
You really kind of talk about philosophy of science. 00:49:09.080 |
There's a poetry and music to a lot of the work you're doing 00:50:08.480 |
gave you a sense that they are touching something 00:50:17.560 |
- Yeah, because when you're listening to Bach, 00:50:42.800 |
you maybe were born as a researcher in Russia, 00:50:53.400 |
what was some of your happiest moments as a researcher? 00:51:15.400 |
- You know, every time when you found something, 00:51:41.800 |
but try to understand that it related to ground truth. 00:52:00.000 |
- No, but how it related to the ground truth, 00:52:18.720 |
So 20 years ago when we discovered statistical learning, 00:52:23.400 |
so nobody believed, except for one guy, Dudley, from MIT. 00:52:41.400 |
- So with support vector machines and learning theory, 00:52:44.280 |
when you were working on it, you had a sense? 00:52:50.160 |
That you had a sense of the profundity of it? 00:52:55.640 |
How that this seems to be right, this seems to be powerful. 00:53:13.280 |
I have a feeling that it is completely wrong. 00:53:21.280 |
Because I have proof that there are no different mechanism. 00:53:24.680 |
You can have some cosmetic improvement, you can do, 00:53:29.680 |
but in terms of invariance, you need both invariance 00:53:37.080 |
and statistical learning, and they should work together. 00:53:58.240 |
Well, Vladimir, thank you so much for talking today.