back to index

Vladimir Vapnik: Statistical Learning | Lex Fridman Podcast #5


Chapters

0:0 Introduction
1:4 God doesnt play dice
4:8 Is math a poetry
7:25 Human intuition
8:44 The role of imagination
9:59 The role of interpretation
12:48 The nature of information
15:58 The English proverb
18:17 A admissible set of functions
20:1 The task of learning
21:5 The process of learning
22:56 Deep learning as neural networks
27:27 The beauty of deep learning
30:34 Can machines think
33:7 Complexity
35:54 Edges
36:53 Learning in the world
39:39 Learning absolute
39:57 Line of work
40:45 Open problem
43:48 Invariance
46:16 The problem of intelligence
48:48 Poetry and music
50:37 Happiest moments
51:52 The possibility of discovery

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Vladimir Vapnik.
00:00:03.000 | He's the co-inventor of the support vector machine,
00:00:05.240 | support vector clustering, VC theory,
00:00:07.880 | and many foundational ideas in statistical learning.
00:00:11.160 | He was born in the Soviet Union
00:00:13.080 | and worked at the Institute of Controlled Sciences in Moscow.
00:00:16.240 | Then in the United States,
00:00:18.000 | he worked at AT&T, NEC Labs, Facebook Research,
00:00:22.240 | and now is a professor at Columbia University.
00:00:25.920 | His work has been cited over 170,000 times.
00:00:30.160 | He has some very interesting ideas
00:00:31.840 | about artificial intelligence and the nature of learning,
00:00:34.760 | especially on the limits of our current approaches
00:00:37.560 | and the open problems in the field.
00:00:39.360 | This conversation is part of MIT course
00:00:42.480 | on artificial general intelligence
00:00:44.240 | and the Artificial Intelligence Podcast.
00:00:46.840 | If you enjoy it, please subscribe on YouTube
00:00:49.560 | or rate it on iTunes or your podcast provider of choice,
00:00:52.960 | or simply connect with me on Twitter
00:00:55.280 | or other social networks at Lex Friedman, spelled F-R-I-D.
00:01:00.160 | And now, here's my conversation with Vladimir Vapnik.
00:01:03.760 | Einstein famously said that God doesn't play dice.
00:01:08.840 | - Yeah.
00:01:09.960 | - You have studied the world through the eyes of statistics,
00:01:12.840 | so let me ask you in terms of the nature of reality,
00:01:17.320 | fundamental nature of reality, does God play dice?
00:01:21.320 | - You don't know some factors.
00:01:25.080 | And because you don't know some factors,
00:01:28.200 | which could be important, it looks like God plays dice.
00:01:33.200 | But we only should describe.
00:01:38.040 | In philosophy, they distinguish between two positions,
00:01:42.120 | positions of instrumentalism,
00:01:44.960 | where you're creating theory for prediction,
00:01:47.800 | and position of realism,
00:01:51.000 | where you're trying to understand what God did.
00:01:54.680 | - Can you describe instrumentalism and realism a little bit?
00:01:58.440 | - For example, if you have some mechanical laws,
00:02:03.280 | what is that?
00:02:05.080 | Is it law which true always and everywhere,
00:02:11.360 | or it is law which allow you to predict
00:02:14.920 | position of moving element?
00:02:20.160 | What you believe.
00:02:23.000 | You believe that it is God's law,
00:02:25.520 | that God created the world,
00:02:28.520 | which obey to this physical law,
00:02:33.200 | or it is just law for predictions.
00:02:36.280 | - And which one is instrumentalism?
00:02:38.440 | - For predictions.
00:02:39.960 | If you believe that this is law of God,
00:02:43.680 | and it's always true everywhere,
00:02:47.560 | that means that you're realist.
00:02:50.080 | So you're trying to really understand God's thought.
00:02:55.080 | - So the way you see the world is as an instrumentalist?
00:02:58.640 | - You know, I'm working for some models,
00:03:03.280 | model of machine learning.
00:03:07.000 | So in this model, we can see setting,
00:03:11.720 | and we try to solve,
00:03:15.360 | resolve the setting to solve the problem.
00:03:18.320 | And you can do it in two different way,
00:03:20.800 | from the point of view of instrumentalist,
00:03:23.880 | and that's what everybody does now,
00:03:27.160 | because they say the goal of machine learning
00:03:31.640 | is to find the rule for classification.
00:03:36.640 | That is true, but it is instrument for prediction.
00:03:41.000 | But I can say the goal of machine learning
00:03:46.240 | is to learn about conditional probability.
00:03:50.080 | So how God played use,
00:03:52.880 | and is he play, what is probability for one,
00:03:56.000 | what is probability for another given situation?
00:04:00.000 | But for prediction, I don't need this.
00:04:02.680 | I need the rule.
00:04:04.320 | But for understanding, I need conditional probability.
00:04:08.520 | - So let me just step back a little bit first
00:04:10.640 | to talk about, you mentioned, which I read last night,
00:04:14.000 | the parts of the 1960 paper by Eugene Wigner,
00:04:19.000 | "Unreasonable Effectiveness of Mathematics
00:04:23.520 | "and Natural Sciences."
00:04:24.920 | Such a beautiful paper, by the way.
00:04:29.400 | Made me feel, to be honest,
00:04:32.620 | to confess my own work in the past few years
00:04:35.560 | on deep learning, heavily applied,
00:04:38.440 | made me feel that I was missing out
00:04:40.360 | on some of the beauty of nature
00:04:43.440 | in the way that math can uncover.
00:04:45.600 | So let me just step away from the poetry
00:04:49.160 | of that for a second.
00:04:50.400 | How do you see the role of math in your life?
00:04:53.080 | Is it a tool, is it poetry?
00:04:55.600 | Where does it sit?
00:04:57.000 | And does math, for you, have limits
00:05:00.120 | of what it can describe?
00:05:01.420 | - Some people say that math is language which use God.
00:05:06.420 | So I believe--
00:05:10.320 | - Speak to God or use God?
00:05:12.000 | - Use God. - Use God.
00:05:14.080 | - Yeah.
00:05:15.560 | So I believe that this article
00:05:20.560 | about effectiveness, unreasonable effectiveness of math,
00:05:27.820 | is that if you're looking at mathematical structures,
00:05:32.120 | they know something about reality.
00:05:37.720 | And the most scientists from natural science,
00:05:42.480 | they're looking on equation and trying to understand reality.
00:05:47.000 | So the same in machine learning.
00:05:50.080 | If you try, very carefully look on all equations
00:05:56.280 | which define conditional probability,
00:06:00.640 | you can understand something about reality
00:06:04.720 | more than from your fantasy.
00:06:08.160 | - So math can reveal the simple underlying principles
00:06:12.480 | of reality, perhaps.
00:06:13.880 | - You know what means simple?
00:06:16.880 | It is very hard to discover them.
00:06:19.120 | But then when you discover them and look at them,
00:06:23.840 | you see how beautiful they are.
00:06:26.800 | And it is surprising why people did not see that before.
00:06:33.600 | You're looking on equation and derive it from equations.
00:06:37.520 | For example, I talked yesterday about least square method.
00:06:42.520 | And people have a lot of fantasy
00:06:45.360 | how to improve least square method.
00:06:48.160 | But if you're going step by step by solving some equations,
00:06:52.400 | you suddenly will get some term which, after thinking,
00:06:57.400 | you understand that it describes position
00:07:02.320 | of observation point.
00:07:04.360 | In least square method, we throw out a lot of information.
00:07:08.240 | We don't look in composition of point of observations,
00:07:11.760 | we're looking only on residuals.
00:07:14.600 | But when you understood that, that's very simple idea,
00:07:19.400 | but it's not too simple to understand.
00:07:22.320 | And you can derive this just from equations.
00:07:25.680 | - So some simple algebra, a few steps will take you
00:07:28.880 | to something surprising that when you think about--
00:07:32.520 | - Absolutely, yes.
00:07:33.360 | - You understand.
00:07:34.360 | - And that is proof that human intuition
00:07:39.320 | not too rich and very primitive.
00:07:42.640 | And it does not see very simple situations.
00:07:47.640 | - So let me take a step back.
00:07:50.560 | In general, yes, right?
00:07:53.560 | But what about human, as opposed to intuition,
00:07:58.080 | ingenuity, moments of brilliance?
00:08:03.080 | Do you have to be so hard on human intuition?
00:08:09.480 | Are there moments of brilliance in human intuition
00:08:11.880 | that can leap ahead of math,
00:08:14.900 | and then the math will catch up?
00:08:16.500 | - I don't think so.
00:08:19.440 | I think that the best human intuition,
00:08:23.600 | it is putting in axioms.
00:08:26.480 | And then it is technical way--
00:08:28.800 | - See where the axioms take you.
00:08:30.800 | - Yeah.
00:08:31.920 | But if they correctly take axioms,
00:08:34.980 | but axiom polished during generations of scientists.
00:08:39.980 | And this is integral wisdom.
00:08:45.080 | - So that's beautifully put.
00:08:47.560 | But if you maybe look at,
00:08:51.000 | when you think of Einstein and special relativity,
00:08:56.840 | what is the role of imagination coming first there
00:09:01.200 | in the moment of discovery of an idea?
00:09:03.620 | So there's obviously a mix of math
00:09:06.440 | and out of the box imagination there.
00:09:10.840 | - That I don't know.
00:09:12.600 | Whatever I did, I exclude any imagination.
00:09:17.600 | Because whatever I saw in machine learning
00:09:20.840 | that come from imagination,
00:09:22.800 | like features, like deep learning,
00:09:26.440 | they are not relevant to the problem.
00:09:28.480 | When you're looking very carefully
00:09:31.960 | from mathematical equations,
00:09:34.160 | you're deriving very simple theory,
00:09:36.720 | which goes far beyond theoretically
00:09:39.640 | than whatever people can imagine.
00:09:42.080 | Because it is not good fantasy.
00:09:44.760 | It is just interpretation, it is just fantasy,
00:09:48.040 | but it is not what you need.
00:09:51.340 | You don't need any imagination
00:09:53.920 | to derive, say, main principle of machine learning.
00:09:58.920 | - When you think about learning and intelligence,
00:10:02.760 | maybe thinking about the human brain
00:10:04.560 | and trying to describe mathematically
00:10:06.280 | the process of learning,
00:10:07.620 | that is something like what happens in the human brain.
00:10:13.220 | Do you think we have the tools currently?
00:10:15.840 | Do you think we will ever have the tools
00:10:19.040 | to try to describe that process of learning?
00:10:21.340 | - It is not description what's going on.
00:10:25.840 | It is interpretation.
00:10:27.400 | It is your interpretation.
00:10:29.440 | Your vision can be wrong.
00:10:32.140 | You know, when guy invent microscope,
00:10:36.240 | Levenguk, for the first time,
00:10:38.960 | only he got this instrument and nobody,
00:10:41.500 | only he kept secret about microscope.
00:10:45.500 | But he wrote report in London Academy of Science.
00:10:49.140 | In his report, when he looking at the blood,
00:10:52.060 | he look everywhere, on the water,
00:10:53.740 | on the blood, on the skin.
00:10:56.380 | But he described blood like fight
00:10:59.860 | between queen and king.
00:11:04.100 | So he saw blood cells, red cells,
00:11:08.180 | and he imagined that it is army fighting each other.
00:11:11.420 | And it was his interpretation of situation.
00:11:15.940 | And he sent this report in Academy of Science.
00:11:19.860 | They very carefully look because they believe
00:11:22.740 | that he is right, he saw something.
00:11:25.740 | But he gave wrong interpretation.
00:11:28.340 | And I believe the same can happen with brain.
00:11:31.340 | - With brain, yeah.
00:11:33.180 | - Because the most important part,
00:11:34.940 | you know, I believe in human language.
00:11:38.940 | In some proverb, is so much wisdom.
00:11:43.100 | For example, people say that it is better
00:11:48.100 | than thousand days of diligent studies one day
00:11:52.500 | with great teacher.
00:11:54.040 | But if I will ask you what teacher does, nobody knows.
00:11:59.040 | And that is intelligence.
00:12:01.500 | And but we know from history,
00:12:04.220 | and now from math and machine learning,
00:12:08.780 | that teacher can do a lot.
00:12:12.140 | - So what, from a mathematical point of view,
00:12:14.460 | is the great teacher?
00:12:16.180 | - I don't know.
00:12:17.300 | - That's an open question.
00:12:18.140 | - No, no, no, but we can say what teacher can do.
00:12:23.140 | He can introduce some invariants,
00:12:28.500 | some predicate for creating invariants.
00:12:32.340 | How he doing it, I don't know,
00:12:34.180 | because teacher knows reality,
00:12:36.700 | and can describe from this reality,
00:12:39.140 | a predicate invariants.
00:12:41.280 | But he knows that when you're using invariant,
00:12:43.540 | he can decrease number of observations 100 times.
00:12:47.380 | - So but, maybe try to pull that apart a little bit.
00:12:53.020 | I think you mentioned like a piano teacher saying
00:12:56.500 | to the student, play like a butterfly.
00:12:59.620 | - Yeah.
00:13:00.460 | - I played piano, I played guitar for a long time.
00:13:02.420 | And yeah, that's, there's, maybe it's romantic, poetic,
00:13:09.820 | but it feels like there's a lot of truth in that statement.
00:13:12.600 | Like there is a lot of instruction in that statement.
00:13:15.480 | And so can you pull that apart?
00:13:17.360 | What is that?
00:13:19.840 | The language itself may not contain this information.
00:13:22.640 | - It's not blah, blah, blah.
00:13:24.200 | - It is not blah, blah, blah, yeah.
00:13:25.680 | - It affects you.
00:13:27.000 | - It's what?
00:13:27.840 | - Affect you.
00:13:28.680 | - Yeah.
00:13:29.500 | - Affect your playing.
00:13:30.340 | - Yes, it does, but it's not the lang,
00:13:33.280 | it feels like a, what is the information
00:13:37.080 | being exchanged there?
00:13:38.040 | What is the nature of information?
00:13:39.820 | What is the representation of that information?
00:13:41.960 | - I believe that it is sort of predicate, but I don't know.
00:13:45.440 | That is exactly what intelligence
00:13:48.160 | in machine learning should be.
00:13:49.560 | - Yes.
00:13:50.400 | - Because the rest is just mathematical technique.
00:13:53.200 | I think that what was discovered recently
00:13:57.960 | is that there is two type, two mechanism of learning.
00:14:02.960 | One called strong convergence mechanism
00:14:06.080 | and weak convergence mechanism.
00:14:08.600 | Before people use only one convergent.
00:14:11.240 | In weak convergence mechanism, you can use predicate.
00:14:15.880 | That's what play like butterfly
00:14:19.400 | and it will immediately affect your playing.
00:14:23.680 | You know, there is English proverb, great.
00:14:27.360 | If it looks like a duck, swims like a duck,
00:14:31.720 | and quack like a duck, then it is probably duck.
00:14:35.240 | - Yes.
00:14:36.320 | - But this is exact about predicate.
00:14:40.440 | Looks like a duck, what it means.
00:14:42.840 | So you saw many ducks that you're training data.
00:14:46.760 | So you have description of how looks,
00:14:51.760 | integral looks ducks.
00:14:56.520 | - Yeah, the visual characteristics of a duck, yeah.
00:14:59.400 | - Yeah, but you want, and you have model
00:15:02.360 | for the cognition doubts.
00:15:04.260 | So you would like to the theoretical description
00:15:07.920 | from model coincide with empirical description
00:15:12.520 | which you saw on teletext data.
00:15:14.560 | So about looks like a duck, it is general.
00:15:18.480 | But what about swims like a duck?
00:15:20.600 | You should know the duck swims.
00:15:23.600 | You can say it play chess like a duck.
00:15:26.560 | Okay, duck doesn't play chess.
00:15:28.920 | And it is completely legal predicate, but it is useless.
00:15:34.740 | So half teacher can recognize not useless predicate.
00:15:39.740 | So up to now, we don't use this predicate
00:15:44.700 | in existing machine learning.
00:15:46.740 | - And you think that's absolutely--
00:15:47.580 | - So why we need zillions of data?
00:15:49.360 | But in this English proverb,
00:15:53.500 | they use only three predicate.
00:15:55.620 | Looks like a duck, swims like a duck, and quack like a duck.
00:15:59.140 | - So you can't deny the fact that swims like a duck
00:16:02.100 | and quacks like a duck has humor in it, has ambiguity.
00:16:07.100 | - Let's talk about swim like a duck.
00:16:10.840 | It doesn't say jumps, jump like a duck.
00:16:17.740 | Because-- - It's not relevant.
00:16:19.400 | - But that means that you know ducks,
00:16:24.100 | you know different birds, you know animals.
00:16:27.660 | And you derive from this that it is relevant
00:16:30.420 | to say swim like a duck.
00:16:32.460 | - So underneath, in order for us to understand
00:16:35.620 | swims like a duck, it feels like we need to know
00:16:39.140 | millions of other little pieces of information
00:16:42.380 | which we pick up along the way.
00:16:44.340 | You don't think so.
00:16:45.180 | There doesn't need to be this knowledge base.
00:16:48.140 | In those statements carries some rich information
00:16:52.660 | that helps us understand the essence of duck.
00:16:57.320 | How far are we from integrating predicates?
00:17:02.040 | - You know that when you consider complete
00:17:05.440 | theory of machine learning.
00:17:07.400 | So what it does, you have a lot of functions.
00:17:11.180 | And then you're talking, it looks like a duck.
00:17:16.440 | You see your training data.
00:17:20.800 | From training data you recognize like
00:17:26.440 | expected duck should look.
00:17:30.220 | Then you remove all functions which does not look
00:17:35.200 | like you think it should look from training data.
00:17:40.120 | So you decrease amount of function
00:17:42.760 | from which you pick up one.
00:17:45.880 | Then you give a second predicate.
00:17:48.360 | And again decrease the set of function.
00:17:51.880 | And after that you pick up the best function you can.
00:17:55.440 | Fine, it is standard machine learning.
00:17:58.160 | So why you need not too many examples.
00:18:01.900 | - 'Cause your predicates aren't very good.
00:18:05.520 | Or you're not--
00:18:06.360 | - Yeah, that means the predicate variable.
00:18:09.240 | Because every predicate is invented
00:18:12.600 | to decrease admissible set of function.
00:18:15.840 | - So you talk about admissible set of functions
00:18:20.400 | and you talk about good functions.
00:18:22.480 | So what makes a good function?
00:18:24.360 | - So admissible set of function is set of function
00:18:28.640 | which has small capacity or small diversity,
00:18:32.800 | small VC dimension example.
00:18:35.200 | Which contain good function inside.
00:18:37.000 | - So by the way for people who don't know,
00:18:38.840 | VC, you're the V in the VC.
00:18:42.560 | So how would you describe to a layperson what VC theory is?
00:18:49.320 | How would you describe VC?
00:18:51.360 | - When you have a machine,
00:18:53.520 | so machine capable to pick up one function
00:18:59.040 | from the admissible set of function.
00:19:01.080 | But set of admissible function can be big.
00:19:06.600 | So it contain all continuous functions and it's useless.
00:19:11.720 | You don't have so many examples to pick up function.
00:19:15.400 | But it can be small.
00:19:17.360 | Small, we call it capacity,
00:19:21.560 | but maybe better call diversity.
00:19:24.640 | So not very different function in the set.
00:19:27.200 | It's infinite set of function, but not very diverse.
00:19:31.360 | So it is small VC dimension.
00:19:34.360 | When VC dimension is small,
00:19:36.040 | you need small amount of training data.
00:19:40.480 | So the goal is to create admissible set of functions
00:19:47.320 | which is, have small VC dimension
00:19:51.320 | and contain good function.
00:19:53.200 | Then you should, you will be able to pick up the function
00:19:58.200 | using small amount of observations.
00:20:00.360 | - So that is the task of learning.
00:20:06.120 | - Yeah.
00:20:06.960 | - Is creating a set of admissible functions
00:20:11.360 | that has a small VC dimension.
00:20:13.120 | And then you've figured out a clever way of picking up.
00:20:17.280 | - No, that is goal of learning,
00:20:19.600 | which I formulated yesterday.
00:20:22.440 | Statistical learning theory does not involve
00:20:26.800 | in creating admissible set of function.
00:20:30.360 | In classical learning theory,
00:20:32.440 | everywhere, 100% in textbook,
00:20:35.520 | the set of function, admissible set of function is given.
00:20:39.200 | But this is science about nothing
00:20:41.760 | because the most difficult problem
00:20:44.040 | to create admissible set of functions
00:20:47.120 | given, say, a lot of functions,
00:20:51.160 | continual set of function.
00:20:53.080 | Create admissible set of functions,
00:20:55.000 | that means that it has finite VC dimension,
00:20:58.800 | small VC dimension, and contain good function.
00:21:02.280 | So this was out of consideration.
00:21:05.280 | - So what's the process of doing that?
00:21:07.240 | I mean, it's fascinating.
00:21:08.280 | What is the process of creating this
00:21:10.440 | admissible set of functions?
00:21:13.200 | - That is invariance.
00:21:14.960 | - That's invariance.
00:21:15.800 | Can you describe invariance?
00:21:17.280 | - Yeah, you're looking of properties of training data.
00:21:22.440 | And properties means that you have some function
00:21:27.440 | and you just count what is value,
00:21:35.080 | average value of function on training data.
00:21:37.800 | You have model and what is expectation
00:21:43.040 | of this function on the model.
00:21:44.960 | And they should coincide.
00:21:46.720 | So the problem is about how to pick up functions.
00:21:51.720 | It can be any function.
00:21:53.600 | In fact, it is true for all functions.
00:22:00.640 | But because when we're talking, say,
00:22:05.640 | duck does not jumping,
00:22:08.080 | so you don't ask question, jump like a duck.
00:22:11.240 | Because it is trivial, it does not jumping
00:22:14.320 | and doesn't help you to recognize jump.
00:22:16.680 | But you know something, which question to ask.
00:22:20.360 | When you're asking, it swims like a duck.
00:22:24.960 | But looks like a duck at this general situation.
00:22:28.080 | Looks like, say, guy who have this illness,
00:22:33.080 | this disease, it is legal.
00:22:37.840 | So there is a general type of predicate,
00:22:43.360 | looks like, and special type of predicate,
00:22:47.280 | which related to this specific problem.
00:22:50.040 | And that is intelligence part of all this business.
00:22:54.160 | And that where teacher is involved.
00:22:56.320 | - Incorporating the specialized predicates.
00:22:59.120 | Okay, what do you think about deep learning
00:23:02.960 | as neural networks, these arbitrary architectures
00:23:07.960 | as helping accomplish some of the tasks
00:23:12.200 | you're thinking about?
00:23:13.280 | Their effectiveness or lack thereof?
00:23:15.080 | What are the weaknesses and what are the possible strengths?
00:23:20.080 | - You know, I think that this is fantasy.
00:23:22.600 | Everything, which like deep learning, like features.
00:23:28.520 | Let me give you this example.
00:23:32.400 | One of the greatest book, this Churchill book
00:23:36.360 | about history of Second World War.
00:23:39.240 | And he's starting this book,
00:23:41.800 | describing that in old time, when war is over,
00:23:46.800 | so the great kings, they gather together,
00:23:53.080 | and most all of them were relatives.
00:23:58.000 | And they discussed what should be done,
00:24:00.600 | how to create peace.
00:24:02.920 | And they came to agreement.
00:24:05.120 | And when happens First World War,
00:24:08.760 | the general public came in power.
00:24:13.600 | And they were so greedy that robbed Germany.
00:24:17.360 | And it was clear for everybody that it is not peace.
00:24:21.960 | That peace will last only 20 years.
00:24:24.760 | Because they was not professionals.
00:24:28.800 | And the same I see in machine learning.
00:24:32.160 | There are mathematicians who are looking for the problem
00:24:35.640 | from very deep point of view, mathematical point of view.
00:24:40.120 | And there are computer scientists,
00:24:43.120 | who mostly does not know mathematics.
00:24:46.320 | They just have interpretation of that.
00:24:48.960 | And they invented a lot of blah, blah, blah interpretations
00:24:52.520 | like deep learning.
00:24:53.960 | Why you did deep learning?
00:24:55.280 | Mathematic does not know deep learning.
00:24:57.640 | Mathematic does not know neurons.
00:25:00.840 | It is just function.
00:25:02.520 | If you like to say, piecewise linear function,
00:25:05.400 | say that.
00:25:06.680 | And do it in class of piecewise linear function.
00:25:10.840 | But they invent something.
00:25:12.880 | And then they try to prove advantage of that
00:25:17.880 | through interpretations, which mostly wrong.
00:25:22.200 | And when it not enough, they appeal to brain,
00:25:25.760 | which they know nothing about that.
00:25:27.600 | Nobody knows what going on in the brain.
00:25:30.400 | So I think that more reliable, look on math.
00:25:34.760 | This is mathematical problem.
00:25:36.880 | Do your best to solve this problem.
00:25:39.440 | Try to understand that there is not only one way
00:25:42.960 | of convergence, which is strong way of convergence.
00:25:46.160 | There is a weak way of convergence,
00:25:48.160 | which requires predicate.
00:25:49.960 | And if you will go through all this stuff,
00:25:52.800 | you will see that you don't need deep learning.
00:25:56.560 | Even more, I would say one of the theorem,
00:26:01.000 | which called representative theory.
00:26:03.880 | It says that optimal solution of mathematical problem,
00:26:08.880 | which is described learning, is on shadow network.
00:26:16.280 | Not on deep learning.
00:26:21.120 | - And a shallow network, yeah.
00:26:22.360 | The ultimate problem is there.
00:26:24.320 | Absolutely, so in the end,
00:26:26.960 | what you're saying is exactly right.
00:26:29.360 | The question is, you have no value
00:26:32.680 | for throwing something on the table,
00:26:35.480 | playing with it, not math.
00:26:38.080 | It's like in neural network,
00:26:39.200 | where you said throwing something in the bucket
00:26:41.400 | or the biological example and looking at kings and queens
00:26:45.520 | or the cells with a microscope.
00:26:47.320 | You don't see value in imagining the cells
00:26:50.720 | or kings and queens and using that as inspiration
00:26:55.480 | and imagination for where the math will eventually lead you.
00:26:59.100 | You think that interpretation basically deceives you
00:27:03.720 | in a way that's not productive.
00:27:05.520 | - I think that if you're trying to analyze
00:27:09.760 | this business of learning,
00:27:13.920 | and especially discussion about deep learning,
00:27:18.380 | it is discussion about interpretation,
00:27:21.020 | not about things, about what you can say about things.
00:27:26.020 | - That's right, but aren't you surprised
00:27:27.600 | by the beauty of it?
00:27:29.040 | So, not mathematical beauty,
00:27:32.560 | but the fact that it works at all.
00:27:35.640 | Or you're criticizing that very beauty,
00:27:38.960 | our human desire to interpret,
00:27:43.960 | to find our silly interpretations in these constructs.
00:27:49.440 | Like, let me ask you this.
00:27:51.300 | Are you surprised, does it inspire you?
00:27:57.040 | How do you feel about the success of a system
00:27:59.080 | like AlphaGo at beating the game of Go?
00:28:02.140 | Using neural networks to estimate the quality of a board
00:28:08.160 | and the quality of the--
00:28:11.000 | - That is your interpretation, quality of the board.
00:28:14.400 | - Yeah, yes.
00:28:15.480 | - Yeah.
00:28:16.320 | (laughing)
00:28:17.160 | - But it's not our interpretation.
00:28:20.240 | The fact is, a neural network system,
00:28:22.560 | doesn't matter, a learning system
00:28:25.320 | that we don't, I think, mathematically understand that well,
00:28:28.200 | beats the best human player,
00:28:29.680 | does something that was thought impossible.
00:28:31.520 | - That means that it's not a very difficult problem.
00:28:34.120 | - That's it.
00:28:35.120 | So, you empirically, we empirically have discovered
00:28:37.920 | that this is not a very difficult problem.
00:28:40.400 | - Yeah.
00:28:41.240 | (laughing)
00:28:42.160 | - It's true.
00:28:43.000 | So, maybe, can't argue.
00:28:50.280 | - Even more, I would say,
00:28:52.480 | that if they use deep learning,
00:28:55.160 | it is not the most effective way of learning theory.
00:29:00.120 | And usually, when people use deep learning,
00:29:03.920 | they're using zillions of training data.
00:29:07.560 | - Yeah.
00:29:10.600 | - But you don't need this.
00:29:13.440 | So, I describe challenge,
00:29:16.000 | can we do some problems which do well,
00:29:21.000 | deep learning method,
00:29:22.920 | with deep net, using 100 times less training data.
00:29:27.920 | Even more, some problems deep learning cannot solve,
00:29:32.960 | because it's not necessary,
00:29:37.840 | they create admissible set of function.
00:29:40.840 | To create deep architecture means
00:29:43.360 | to create admissible set of functions.
00:29:45.760 | You cannot say that you're creating
00:29:47.360 | good admissible set of functions.
00:29:49.280 | You're just, it's your fantasy.
00:29:52.700 | It is not comes from us.
00:29:54.840 | But it is possible to create admissible set of functions,
00:29:58.680 | because you have your training data.
00:30:01.020 | That actually, for mathematicians,
00:30:04.520 | when you consider invariant,
00:30:08.760 | you need to use law of large numbers.
00:30:12.120 | When you're making training in existing algorithm,
00:30:17.120 | you need uniform law of large numbers,
00:30:20.720 | which is much more difficult,
00:30:22.580 | it requires dimension and all this stuff.
00:30:25.100 | But, nevertheless, if you use both,
00:30:30.100 | weak and strong way of convergence,
00:30:32.900 | you can decrease a lot of training data.
00:30:35.020 | - Yeah, you could do the three,
00:30:36.500 | the swims like a duck and quacks like a duck.
00:30:39.180 | - Yeah, yeah.
00:30:40.020 | - So, let's step back and think about
00:30:45.820 | human intelligence in general.
00:30:48.500 | Clearly, that has evolved in a non-mathematical way.
00:30:52.580 | It wasn't, as far as we know,
00:30:57.760 | God or whoever didn't come up with a model
00:31:02.760 | and place it in our brain of admissible functions.
00:31:05.880 | It kind of evolved.
00:31:06.800 | I don't know, maybe you have a view on this.
00:31:09.640 | So, Alan Turing in the '50s, in his paper,
00:31:14.280 | asked and rejected the question,
00:31:15.880 | can machines think?
00:31:17.360 | It's not a very useful question,
00:31:18.800 | but can you briefly entertain this useless question?
00:31:23.800 | Can machines think?
00:31:25.600 | So, talk about intelligence and your view of it.
00:31:28.640 | - I don't know that.
00:31:30.040 | I know that Turing described imitation.
00:31:35.040 | If computer can imitate human being,
00:31:39.780 | let's call it intelligent.
00:31:41.660 | And he understands that it is not thinking computer.
00:31:45.720 | - Yes.
00:31:46.560 | - He completely understands what he's doing.
00:31:49.360 | But he set up problem of imitation.
00:31:53.720 | So, now we understand that the problem not in imitation.
00:31:58.160 | I'm not sure that intelligence just inside of us.
00:32:03.160 | It may be also outside of us.
00:32:06.720 | I have several observations.
00:32:09.280 | So, when I prove some theorem,
00:32:13.040 | it's very difficult theorem,
00:32:15.200 | but in couple of years, in several places,
00:32:19.120 | people prove the same theorem.
00:32:21.600 | Say, Sauer Lemma after us was done.
00:32:25.960 | Then another guys prove the same theorem.
00:32:28.880 | In the history of science, it's happened all the time.
00:32:32.200 | For example, geometry.
00:32:34.960 | It's happened simultaneously.
00:32:36.440 | First it did Lobachevsky, and then Gauss,
00:32:40.520 | and Boyai, and another guys.
00:32:42.800 | And it approximately in 10 times period,
00:32:46.200 | 10 years period of time.
00:32:47.920 | And I saw a lot of examples like that.
00:32:51.680 | And many mathematicians thinks that
00:32:54.060 | when they develop something,
00:32:56.100 | they develop something in general which affect everybody.
00:33:01.100 | So, maybe our models that intelligence
00:33:04.520 | only inside of us is incorrect.
00:33:07.220 | - It's our interpretation, yeah.
00:33:09.240 | It may be there exist some connection
00:33:13.000 | with world intelligence.
00:33:15.840 | I don't know.
00:33:16.680 | - You're almost like plugging in into--
00:33:19.680 | - Yeah, exactly.
00:33:21.200 | - And contributing to this--
00:33:22.600 | - Into big network.
00:33:24.280 | - Into a big, maybe in your own network.
00:33:26.560 | - No, no, no, no.
00:33:28.320 | - On the flip side of that, maybe you can comment
00:33:30.960 | on big O complexity,
00:33:34.840 | and how you see classifying algorithms
00:33:38.140 | by worst case running time in relation to their input.
00:33:42.180 | So that way of thinking about functions.
00:33:44.760 | Do you think P equals NP?
00:33:47.400 | Do you think that's an interesting question?
00:33:49.800 | - Yeah, it is interesting question.
00:33:51.840 | But let me talk about complexity
00:33:56.840 | in about worst case scenario.
00:33:59.480 | There is a mathematical setting.
00:34:04.200 | When I came to United States in 1990,
00:34:08.080 | people did not know.
00:34:09.160 | They see theories, they did not know statistical learning.
00:34:12.840 | So in Russia it was published to monographs,
00:34:16.880 | our monographs, but in America they did not know.
00:34:20.360 | Then they learned it.
00:34:21.520 | And somebody told me that it is worst case theory,
00:34:25.880 | and they will create real case theory.
00:34:27.720 | But till now it did not.
00:34:30.440 | Because it is mathematical tool.
00:34:34.040 | You can do only what you can do using mathematics.
00:34:38.440 | And which has clear understanding, and clear description.
00:34:43.440 | And for this reason, we introduce complexity.
00:34:50.740 | And we need this, because using,
00:34:57.120 | actually it is diversity, I like this one more.
00:35:01.640 | You see dimension, you can prove some theorems.
00:35:05.120 | But we also create theory for case
00:35:09.320 | when you know probability measure.
00:35:12.600 | And that is the best case which can happen,
00:35:14.760 | it is entropy theory.
00:35:16.760 | So from mathematical point of view,
00:35:20.460 | you know the best possible case,
00:35:22.640 | and the worst possible case.
00:35:25.180 | You can derive different model in medium.
00:35:28.480 | But it's not so interesting.
00:35:30.360 | - You think the edges are interesting?
00:35:33.400 | - The edges are interesting.
00:35:35.040 | Because it is not so easy to get good bound, exact bound.
00:35:40.040 | It's not many cases where you have the bound is not exact.
00:35:49.080 | But interesting principles which discover the mass.
00:35:54.080 | - Do you think it's interesting because it's challenging
00:35:57.640 | and reveals interesting principles
00:36:00.360 | that allow you to get those bounds?
00:36:02.600 | Or do you think it's interesting
00:36:04.220 | because it's actually very useful
00:36:05.780 | for understanding the essence of a function,
00:36:08.260 | of an algorithm?
00:36:09.900 | So it's like me judging your life as a human being
00:36:15.640 | by the worst thing you did and the best thing you did,
00:36:19.000 | versus all the stuff in the middle.
00:36:20.800 | It seems not productive.
00:36:25.480 | - I don't think so because you cannot describe
00:36:29.000 | situation in the middle.
00:36:31.320 | Or it will be not general.
00:36:34.480 | So you can describe edges cases,
00:36:38.920 | and it is clear it has some model.
00:36:41.880 | But you cannot describe model for every new case.
00:36:46.600 | So you will be never accurate when you're using model.
00:36:52.520 | - But from a statistical point of view,
00:36:55.120 | the way you've studied functions
00:36:57.880 | and the nature of learning in the world,
00:37:01.960 | don't you think that the real world has a very long tail?
00:37:06.960 | That the edge cases are very far away from the mean?
00:37:12.480 | The stuff in the middle, or no?
00:37:17.420 | - I don't know that.
00:37:21.120 | - Because--
00:37:21.960 | - I think that, from my point of view,
00:37:26.960 | if you will use formal statistics,
00:37:33.640 | you need uniform law of large numbers.
00:37:38.360 | If you will use this invariance business,
00:37:45.240 | you will need just law of large numbers.
00:37:50.960 | You don't, and there's this huge difference
00:37:53.480 | between uniform law of large numbers and large numbers.
00:37:56.680 | - Is it useful to describe that a little more?
00:37:58.840 | Or should we just take it to--
00:38:01.440 | - No, for example, when I'm talking about duck,
00:38:05.320 | I gave three predicates and that was enough.
00:38:07.920 | But if you will try to do formal, distinguish,
00:38:14.560 | you will need a lot of observation.
00:38:16.920 | - I gotcha.
00:38:19.680 | - So that means that information about looks like a duck
00:38:23.640 | contain a lot of bit of information,
00:38:27.200 | formal bits of information.
00:38:29.720 | So we don't know that, how much bit of information
00:38:34.720 | contain things from artificial intelligence.
00:38:39.800 | And that is the subject of analysis.
00:38:42.880 | Till now, all business, I don't look at it
00:38:48.080 | I don't like how people consider artificial intelligence.
00:38:53.080 | They consider us some codes which imitate
00:38:58.280 | activity of human being.
00:39:00.280 | It is not science, it is applications.
00:39:03.920 | You would like to imitate, go ahead,
00:39:05.760 | it is very useful and good problem.
00:39:09.480 | But you need to learn something more.
00:39:15.840 | How people try to, how people can to develop,
00:39:20.560 | say, predicates, swims like a duck,
00:39:25.560 | or play like butterfly or something like that.
00:39:29.080 | They're not, not the teacher says you,
00:39:32.040 | how it came in his mind.
00:39:34.400 | How he chooses image.
00:39:37.000 | - So that process--
00:39:37.840 | - That is problem of intelligence.
00:39:39.880 | - That is the problem of intelligence.
00:39:41.360 | And you see that connected to the problem of learning?
00:39:44.880 | - Absolutely.
00:39:45.720 | Because you immediately give this predicate,
00:39:48.880 | like specific predicate, swims like a duck,
00:39:52.320 | or quack like a duck.
00:39:53.920 | It was chosen somehow.
00:39:56.680 | - So what is the line of work, would you say?
00:40:00.360 | If you were to formulate as a set of open problems,
00:40:04.200 | that will take us there, to play like a butterfly,
00:40:09.640 | we'll get a system to be able to--
00:40:11.960 | - Let's separate two stories.
00:40:14.360 | One mathematical story, that if you have predicate,
00:40:18.240 | you can do something.
00:40:19.320 | And another story, how to get predicate.
00:40:23.680 | It is intelligence problem,
00:40:26.440 | and people even did not start understanding intelligence.
00:40:31.960 | Because to understand intelligence, first of all,
00:40:35.280 | try to understand what doing teachers.
00:40:37.400 | How teacher teach.
00:40:40.840 | Why one teacher better than another one?
00:40:44.160 | - Yeah, so you think we really even haven't started
00:40:47.800 | on the journey of generating the predicates?
00:40:50.280 | - No, we don't understand.
00:40:51.960 | We even don't understand that this problem exists.
00:40:54.960 | Because did you hear-- - You do.
00:40:58.480 | - No, I just know name.
00:41:02.480 | I want to understand why one teacher better than another.
00:41:07.120 | And how affect teacher, student.
00:41:13.400 | It is not because he repeating the problem
00:41:16.320 | which is in textbook.
00:41:17.600 | He make some remarks.
00:41:19.920 | He make some philosophy of reasoning.
00:41:23.680 | - Yeah, that's a beautiful, so it is a formulation
00:41:27.280 | of a question that is the open problem.
00:41:31.320 | Why is one teacher better than another?
00:41:33.840 | - Right.
00:41:35.240 | What he does better.
00:41:36.360 | - Yeah, what, why at every level?
00:41:42.920 | How do they get better?
00:41:44.840 | What does it mean to be better?
00:41:46.800 | The whole-- - Yeah, yeah.
00:41:50.320 | From whatever model I have,
00:41:53.000 | one teacher can give a very good predicate.
00:41:56.720 | One teacher can say, "Swims like a dog."
00:42:00.440 | And another can say, "Jump like a dog."
00:42:02.440 | And jump like a dog carries zero information.
00:42:07.160 | - Yeah. (laughs)
00:42:09.320 | So what is the most exciting problem
00:42:11.360 | in statistical learning?
00:42:12.880 | Ever worked on or are working on now?
00:42:15.200 | - I just finished this invariant story.
00:42:21.280 | And I'm happy that I believe
00:42:25.240 | that it is ultimate learning story.
00:42:30.240 | At least I can show that there are no another mechanism,
00:42:35.120 | only two mechanisms.
00:42:36.640 | But they separate statistical part
00:42:42.480 | from intelligent part.
00:42:43.920 | And I know nothing about intelligent part.
00:42:47.360 | And if we will know this intelligent part,
00:42:51.600 | so it will help us a lot in teaching,
00:42:56.800 | in learning.
00:42:59.200 | - In learning. - Yeah.
00:43:00.040 | - Do you know we'll know it when we see it?
00:43:02.840 | - So for example, in my talk,
00:43:04.880 | the last slide was a challenge.
00:43:07.000 | So you have, say, NIST, digital recognition problem.
00:43:12.120 | And deep learning claims that they did it very well.
00:43:16.800 | Say, 99.5% of correct answers.
00:43:21.080 | But they use 60,000 observations.
00:43:24.440 | - Yeah.
00:43:25.280 | - Can you do the same using 100 times less?
00:43:27.840 | But incorporating invariants.
00:43:31.400 | What it means, you know, digit one, two, three.
00:43:34.520 | - Yeah.
00:43:35.360 | - Just looking at that,
00:43:36.360 | explain me which invariant I should keep
00:43:40.560 | to use 100 examples,
00:43:43.120 | or say, 100 times less examples to do the same job.
00:43:46.760 | - Yeah, that last slide,
00:43:50.720 | and unfortunately, your talk ended quickly,
00:43:55.080 | but that last slide was a powerful open challenge
00:43:59.360 | in a formulation of the essence here.
00:44:01.920 | - Yeah, that is exact problem of intelligence.
00:44:04.800 | Because everybody,
00:44:09.800 | when machine learning started,
00:44:11.920 | and it was developed by mathematicians,
00:44:14.920 | they immediately recognized
00:44:16.960 | that we use much more training data than humans needed.
00:44:21.480 | But now again, we came to the same story.
00:44:25.560 | Have to decrease.
00:44:27.360 | And that is the problem of learning.
00:44:30.560 | It is not like in deep learning,
00:44:32.520 | they use zillions of training data.
00:44:35.120 | Because maybe zillions are not enough
00:44:38.400 | if you have a good invariants.
00:44:43.400 | Maybe you will never collect some number of observations.
00:44:49.400 | But now it is a question to intelligence.
00:44:54.080 | How to do that?
00:44:55.840 | Because statistical part is ready.
00:44:58.440 | As soon as you supply us with predicate,
00:45:02.800 | we can do a good job with small amount of observations.
00:45:06.800 | And the very first challenge is well-known
00:45:09.520 | digit recognition.
00:45:10.960 | And you know digits.
00:45:12.280 | And please, tell me invariants.
00:45:15.640 | I think about that, I can say,
00:45:17.760 | for digit three, I would introduce concept
00:45:22.360 | of horizontal symmetry.
00:45:24.800 | So the digit three has horizontal symmetry,
00:45:30.120 | say more than say digit two or something like that.
00:45:34.480 | But as soon as I get the idea of horizontal symmetry,
00:45:38.240 | I can mathematically invent a lot of measure
00:45:41.760 | of horizontal symmetry.
00:45:43.800 | Or even vertical symmetry, or diagonal symmetry, whatever.
00:45:47.280 | If I have idea of symmetry.
00:45:48.920 | But what else?
00:45:50.880 | Looking on digit, I see that it is meta-predicate.
00:45:57.480 | Which is not shape.
00:46:04.280 | It is something like symmetry,
00:46:07.360 | like how dark is whole picture, something like that.
00:46:11.560 | Which can self-rise a predicate.
00:46:16.200 | - You think such a predicate could rise
00:46:18.680 | out of something that's not general?
00:46:25.240 | Meaning, it feels like for me to be able
00:46:31.680 | to understand the difference between a two and a three,
00:46:35.360 | I would need to have had a childhood
00:46:39.480 | of 10 to 15 years playing with kids,
00:46:45.400 | going to school, being yelled by parents.
00:46:49.600 | All of that, walking, jumping, looking at ducks.
00:46:55.880 | And now, then I would be able to generate
00:46:58.520 | the right predicate for telling the difference
00:47:01.360 | between two and a three.
00:47:03.080 | Or do you think there's a more efficient way?
00:47:05.880 | - I don't know.
00:47:07.360 | I know for sure that you must know something
00:47:11.040 | more than digits.
00:47:12.520 | - Yes, and that's a powerful statement.
00:47:14.600 | - Yeah, but maybe there are several languages
00:47:19.400 | of description, these elements of digits.
00:47:24.400 | So I'm talking about symmetry,
00:47:27.120 | about some properties of geometry,
00:47:30.160 | I'm talking about something abstract.
00:47:32.880 | I don't know that.
00:47:34.600 | But this is a problem of intelligence.
00:47:38.800 | So in one of our articles, it is trivial to show
00:47:43.040 | that every example can carry not more than one bit
00:47:47.920 | of information in real.
00:47:49.960 | Because when you show example,
00:47:54.160 | and you say this is one,
00:47:57.480 | you can remove, say, a function which does not tell you one.
00:48:02.480 | Say, it's the best strategy.
00:48:05.000 | If you can do it perfectly,
00:48:06.840 | it's remove half of the functions.
00:48:10.040 | But when you use one predicate,
00:48:12.600 | which looks like a duck,
00:48:14.800 | you can remove much more functions than half.
00:48:17.920 | And that means that it contains a lot of bit of information
00:48:22.280 | from formal point of view.
00:48:25.960 | But when you have a general picture
00:48:30.960 | of what you want to recognize,
00:48:33.800 | a general picture of the world,
00:48:35.600 | can you invent this predicate?
00:48:39.240 | And that predicate carry a lot of information.
00:48:45.320 | - Beautifully put.
00:48:48.680 | Maybe just me, but in all the math you show,
00:48:52.520 | in your work, which is some of the most profound
00:48:56.040 | mathematical work in the field of learning AI
00:48:59.360 | and just math in general,
00:49:00.800 | I hear a lot of poetry and philosophy.
00:49:04.080 | You really kind of talk about philosophy of science.
00:49:09.080 | There's a poetry and music to a lot of the work you're doing
00:49:12.520 | and the way you're thinking about it.
00:49:14.080 | So where does that come from?
00:49:16.640 | Do you escape to poetry?
00:49:18.840 | Do you escape to music or not?
00:49:21.080 | - I think that there exists ground truth.
00:49:24.120 | - There exists ground truth?
00:49:25.640 | - Yeah, and that can be seen everywhere.
00:49:29.560 | The smart guy philosopher,
00:49:32.160 | sometimes I surprise how they deep see.
00:49:37.160 | Sometimes I see that some of them
00:49:40.520 | are completely out of subject.
00:49:43.880 | But the ground truth I see in music.
00:49:50.680 | - Music is the ground truth?
00:49:52.000 | - Yeah.
00:49:52.840 | And in poetry, many poets,
00:49:55.640 | they believe that they take dictation.
00:50:00.640 | (laughing)
00:50:01.800 | - So what piece of music,
00:50:05.400 | as a piece of empirical evidence,
00:50:08.480 | gave you a sense that they are touching something
00:50:12.800 | in the ground truth?
00:50:14.520 | - It is structure.
00:50:15.520 | - The structure of the math of music.
00:50:17.560 | - Yeah, because when you're listening to Bach,
00:50:19.960 | you see the structure.
00:50:22.160 | Very clear, very classic, very simple.
00:50:25.480 | And the same in maths,
00:50:26.560 | when you have axioms in geometry,
00:50:31.120 | you have the same feeling.
00:50:32.400 | And in poetry, sometimes you see the same.
00:50:35.880 | - And if you look back at your childhood,
00:50:40.960 | you grew up in Russia,
00:50:42.800 | you maybe were born as a researcher in Russia,
00:50:46.240 | you've developed as a researcher in Russia,
00:50:47.960 | you've came to United States,
00:50:49.720 | you've been in a few places.
00:50:51.760 | If you look back,
00:50:53.400 | what was some of your happiest moments as a researcher?
00:50:58.400 | Some of the most profound moments,
00:51:01.960 | not in terms of their impact on society,
00:51:06.280 | but in terms of their impact
00:51:09.640 | on how damn good you feel that day,
00:51:12.600 | and you remember that moment.
00:51:15.400 | - You know, every time when you found something,
00:51:19.560 | it is great.
00:51:21.920 | - Yeah, one of the things in life.
00:51:24.120 | - Every simple things.
00:51:26.440 | - Just even-- - But my general feeling
00:51:27.920 | that most of my time was wrong.
00:51:31.280 | You should go again and again and again,
00:51:35.280 | and try to be honest in front of yourself.
00:51:39.400 | Not to make interpretation,
00:51:41.800 | but try to understand that it related to ground truth.
00:51:45.880 | It is not my blah, blah, blah interpretation
00:51:50.880 | and something like that.
00:51:52.560 | - But you're allowed to get excited
00:51:54.160 | at the possibility of discovery.
00:51:57.240 | - Oh yeah.
00:51:58.080 | - You have to double check it, but--
00:52:00.000 | - No, but how it related to the ground truth,
00:52:04.600 | is it just temporary or it is forever?
00:52:10.840 | - You know, you always have a feeling
00:52:13.200 | when you found something, how big is that?
00:52:18.720 | So 20 years ago when we discovered statistical learning,
00:52:23.400 | so nobody believed, except for one guy, Dudley, from MIT.
00:52:28.400 | And then in 20 years, it became fashion.
00:52:36.600 | And the same with support vector machines.
00:52:39.640 | That's kernel machines.
00:52:41.400 | - So with support vector machines and learning theory,
00:52:44.280 | when you were working on it, you had a sense?
00:52:50.160 | That you had a sense of the profundity of it?
00:52:55.640 | How that this seems to be right, this seems to be powerful.
00:53:00.640 | - Right, absolutely, immediately.
00:53:03.760 | I recognize that it will last forever.
00:53:08.280 | And now, when I found this invariance story,
00:53:13.280 | I have a feeling that it is completely wrong.
00:53:21.280 | Because I have proof that there are no different mechanism.
00:53:24.680 | You can have some cosmetic improvement, you can do,
00:53:29.680 | but in terms of invariance, you need both invariance
00:53:37.080 | and statistical learning, and they should work together.
00:53:41.520 | But also, I'm happy that we can formulate
00:53:46.520 | what is intelligence, from that.
00:53:49.960 | And to separate from technical part.
00:53:54.200 | That is completely different.
00:53:57.280 | - Absolutely.
00:53:58.240 | Well, Vladimir, thank you so much for talking today.
00:54:00.320 | - Thank you.