back to index

Vladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence | Lex Fridman Podcast #71


Chapters

0:0 Introduction
2:55 Alan Turing: science and engineering of intelligence
9:9 What is a predicate?
14:22 Plato's world of ideas and world of things
21:6 Strong and weak convergence
28:37 Deep learning and the essence of intelligence
50:36 Symbolic AI and logic-based systems
54:31 How hard is 2D image understanding?
60:23 Data
66:39 Language
74:54 Beautiful idea in statistical theory of learning
79:28 Intelligence and heuristics
82:23 Reasoning
85:11 Role of philosophy in learning theory
91:40 Music (speaking in Russian)
95:8 Mortality

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Vladimir Vapnik,
00:00:03.220 | part two, the second time we spoke on the podcast.
00:00:07.200 | He's the co-inventor of support vector machines,
00:00:09.780 | support vector clustering, VC theory,
00:00:12.120 | and many foundational ideas in statistical learning.
00:00:14.960 | He was born in the Soviet Union,
00:00:17.300 | worked at the Institute of Control Sciences in Moscow,
00:00:20.260 | then in the US, worked at AT&T, NEC Labs,
00:00:24.700 | Facebook AI Research,
00:00:26.120 | and now is a professor at Columbia University.
00:00:29.400 | His work has been cited over 200,000 times.
00:00:33.040 | The first time we spoke on the podcast
00:00:34.880 | was just over a year ago, one of the early episodes.
00:00:39.000 | This time, we spoke after a lecture he gave
00:00:41.460 | titled "Complete Statistical Theory of Learning"
00:00:44.440 | as part of the MIT series of lectures on deep learning
00:00:47.360 | and AI that I organized.
00:00:50.200 | I'll release the video of the lecture in the next few days.
00:00:53.720 | This podcast and the lecture are independent from each other
00:00:56.840 | so you don't need one to understand the other.
00:00:59.420 | The lecture is quite technical and math heavy,
00:01:03.100 | so if you do watch both,
00:01:04.360 | I recommend listening to this podcast first
00:01:06.800 | since the podcast is probably a bit more accessible.
00:01:10.180 | This is the Artificial Intelligence Podcast.
00:01:14.000 | If you enjoy it, subscribe on YouTube,
00:01:16.080 | give it five stars on Apple Podcast,
00:01:17.920 | support it on Patreon,
00:01:19.160 | or simply connect with me on Twitter
00:01:21.080 | at Lex Friedman, spelled F-R-I-D-M-A-N.
00:01:24.760 | As usual, I'll do one or two minutes of ads now
00:01:27.880 | and never any ads in the middle
00:01:29.320 | that can break the flow of the conversation.
00:01:31.460 | I hope that works for you
00:01:32.880 | and doesn't hurt the listening experience.
00:01:35.000 | This show is presented by Cash App,
00:01:38.200 | the number one finance app in the App Store.
00:01:40.600 | When you get it, use code LEXPODCAST.
00:01:43.800 | Cash App lets you send money to friends,
00:01:45.840 | buy Bitcoin, and invest in the stock market
00:01:48.120 | with as little as $1.
00:01:49.880 | Brokerage services are provided by Cash App Investing,
00:01:52.640 | a subsidiary of Square and member SIPC.
00:01:56.480 | Since Cash App allows you to send
00:01:58.000 | and receive money digitally peer-to-peer,
00:02:00.520 | and security in all digital transactions is very important,
00:02:03.600 | let me mention the PCI Data Security Standard,
00:02:06.760 | PCI DSS Level 1, that Cash App is compliant with.
00:02:11.760 | I'm a big fan of standards for safety and security,
00:02:15.280 | and PCI DSS is a good example of that,
00:02:18.940 | where a bunch of competitors got together
00:02:21.020 | and agreed that there needs to be a global standard
00:02:23.480 | around the security of transactions.
00:02:25.800 | Now, we just need to do the same for autonomous vehicles
00:02:28.880 | and AI systems in general.
00:02:30.480 | So again, if you get Cash App from the App Store
00:02:33.880 | or Google Play, and use the code LEXPODCAST,
00:02:37.280 | you get $10, and Cash App will also donate $10 to FIRST,
00:02:41.240 | one of my favorite organizations
00:02:43.120 | that is helping to advance robotics and STEM education
00:02:46.420 | for young people around the world.
00:02:49.720 | And now, here's my conversation with Vladimir Vapnik.
00:02:54.140 | You and I talked about Alan Turing yesterday a little bit.
00:02:58.540 | - Yes.
00:02:59.780 | - And that he, as the father of artificial intelligence,
00:03:02.660 | may have instilled in our field an ethic of engineering,
00:03:05.700 | and not science, seeking more to build intelligence
00:03:09.540 | rather than to understand it.
00:03:11.940 | What do you think is the difference
00:03:13.300 | between these two paths of engineering intelligence
00:03:17.700 | and the science of intelligence?
00:03:21.220 | - It's a completely different story.
00:03:23.660 | Engineering is imitation of human activity.
00:03:27.620 | You have to make a device which behave as human behave,
00:03:33.500 | have all the functions of human.
00:03:39.180 | It does not matter how you do it.
00:03:40.880 | But to understand what is intelligence about
00:03:46.020 | is quite different problem.
00:03:47.700 | So I think, I believe, that it's somehow related
00:03:53.880 | to predicate we talked yesterday about.
00:03:56.380 | Because, look at the Vladimir Probs idea.
00:04:02.860 | He just found 31 here predicates.
00:04:12.660 | He call it units, which can explain human behavior,
00:04:17.660 | at least in Russian tales.
00:04:20.780 | He look at Russian tales and derive from that.
00:04:24.940 | And then people realize that it more wide
00:04:27.460 | than in Russian tales.
00:04:29.580 | It is in TV, in movie serials, and so on and so on.
00:04:33.940 | - So you're talking about Vladimir Probs.
00:04:37.820 | - Right.
00:04:38.660 | - Who in 1928 published a book,
00:04:40.020 | Morphology of the Folk Tale, describing 31 predicates
00:04:44.260 | that have this kind of sequential structure
00:04:48.700 | that a lot of the stories, narratives follow
00:04:52.860 | in Russian folklore and in other content.
00:04:55.040 | We'll talk about it.
00:04:56.100 | I'd like to talk about predicates in a focused way.
00:04:59.180 | But let me, if you'll allow me to stay zoomed out
00:05:02.180 | on our friend Alan Turing.
00:05:03.740 | And he inspired a generation with the imitation game.
00:05:10.180 | - Yes.
00:05:11.660 | - Do you think, if we can linger on that a little bit longer,
00:05:15.220 | do you think we can learn, do you think learning
00:05:19.620 | to imitate intelligence can get us closer
00:05:22.380 | to understanding intelligence?
00:05:25.460 | So why do you think imitation is so far from understanding?
00:05:30.460 | - I think that it is different.
00:05:34.620 | Between you have different goals.
00:05:37.540 | So your goal is to create something, something useful.
00:05:42.540 | And that is great.
00:05:45.980 | And you can see how much things was done
00:05:49.740 | and I believe that it will be done even more.
00:05:52.340 | You have self-driving cars and also this business.
00:05:56.900 | It is great.
00:05:58.420 | And it was inspired by Turing vision.
00:06:02.080 | But understanding is very difficult.
00:06:05.500 | It's more or less philosophical category.
00:06:08.360 | What means understand the world?
00:06:11.300 | I believe in scheme which starts from Plato
00:06:15.820 | that there exists world of ideas.
00:06:19.220 | I believe that intelligence, it is world of ideas.
00:06:22.940 | But it is world of pure ideas.
00:06:25.900 | And when you combine them with reality things,
00:06:33.220 | it creates, as in my case, invariance,
00:06:36.540 | which is very specific.
00:06:38.580 | And that I believe the combination of ideas
00:06:43.580 | in way to constructing invariant is intelligence.
00:06:49.820 | But first of all, predicate.
00:06:53.340 | If you know predicate, and hopefully
00:06:56.180 | then not too much predicate exist.
00:07:00.820 | For example, 31 predicates for human behavior,
00:07:04.260 | it is not a lot.
00:07:06.060 | - Vladimir Propp used 31,
00:07:08.780 | you can even call them predicates,
00:07:12.340 | 31 predicates to describe stories, narratives.
00:07:17.340 | - Right.
00:07:18.380 | - So you think human behavior,
00:07:19.380 | how much of human behavior, how much of our world,
00:07:23.100 | our universe, all the things that matter in our existence
00:07:27.540 | can be summarized in predicates of the kind
00:07:30.860 | that Propp was working with?
00:07:32.660 | - I think that we have a lot of formal behavior.
00:07:36.980 | But I think that predicate is much less.
00:07:41.020 | Because even in this example, which I gave you yesterday,
00:07:45.260 | you saw that predicate can be,
00:07:50.260 | one predicate can construct many different invariants,
00:07:56.700 | depending on your data.
00:07:59.380 | They're applying to different data,
00:08:01.620 | and they give different invariants.
00:08:03.740 | But pure ideas, maybe not so much.
00:08:08.660 | - Not so many.
00:08:09.940 | - I don't know about that.
00:08:11.420 | But my guess, I hope, that's why challenge
00:08:15.060 | about digit recognition, how much you need.
00:08:18.780 | - I think we'll talk about computer vision
00:08:21.900 | and 2D images a little bit in your challenge.
00:08:24.820 | - That's exactly about intelligence.
00:08:26.780 | - That's exactly, that's exactly about,
00:08:30.820 | no, that hopes to be exactly about the spirit
00:08:34.460 | of intelligence in the simplest possible way.
00:08:37.540 | - Yeah, absolutely, you should start the simplest way.
00:08:40.380 | Otherwise you will not be able to do it.
00:08:42.700 | - Well, there's an open question whether starting
00:08:45.540 | at the MNIST digit recognition is a step
00:08:49.220 | towards intelligence, or it's an entirely different thing.
00:08:52.700 | I think that to beat records using, say,
00:08:56.820 | 100, 200 times less examples, you need intelligence.
00:09:00.700 | - You need intelligence.
00:09:01.540 | So let's, because you use this term,
00:09:03.780 | and it would be nice, I'd like to ask simple,
00:09:07.260 | maybe even dumb questions.
00:09:09.980 | Let's start with a predicate.
00:09:11.500 | In terms of terms and how you think about it,
00:09:14.940 | what is a predicate?
00:09:16.100 | - I don't know.
00:09:18.860 | I have a feeling formulas, they exist.
00:09:22.860 | But I believe that predicate for 2D images,
00:09:27.860 | one of them is symmetry.
00:09:32.300 | - Hold on a second, sorry.
00:09:33.520 | Sorry to interrupt and pull you back.
00:09:36.460 | At the simplest level, we're not being profound currently.
00:09:40.700 | A predicate is a statement of something that is true.
00:09:44.020 | - Yes.
00:09:45.760 | - Do you think of predicates as somehow
00:09:50.540 | probabilistic in nature, or is this binary,
00:09:54.700 | this is truly constraints of logical statements
00:09:59.020 | about the world?
00:10:00.220 | - In my definition, the simplest predicate is function.
00:10:04.180 | Function, and you can use this function
00:10:07.580 | to make inner product, that is predicate.
00:10:10.900 | - What's the input, and what's the output of the function?
00:10:14.020 | - Input is x, something which is input in reality.
00:10:18.660 | Say if you consider digit recognition,
00:10:22.460 | it's pixel space, input.
00:10:25.020 | But it is function which in pixel space.
00:10:29.800 | But it can be any function from pixel space.
00:10:34.660 | And you choose, and I believe that there are
00:10:39.500 | several functions which is important
00:10:42.940 | for understanding of images.
00:10:46.440 | One of them is symmetry.
00:10:48.280 | It's not so simple construction,
00:10:50.980 | as I described with linearity, with all this stuff.
00:10:55.020 | But another, I believe, I don't know how many,
00:10:58.860 | is how well-structurized is picture.
00:11:03.260 | - Structurized?
00:11:04.340 | - Yeah.
00:11:05.180 | - What do you mean by structurized?
00:11:06.980 | - It is formal definition, say something heavy
00:11:11.660 | on the left corner, not so heavy in the middle, and so on.
00:11:17.060 | You describe in general concept of what you assume.
00:11:21.860 | - Concepts, some kind of universal concepts.
00:11:25.240 | - Yeah.
00:11:26.700 | But I don't know how to formalize this.
00:11:29.200 | - Do you, so this is the thing.
00:11:31.600 | There's a million ways we can talk about this.
00:11:33.640 | I'll keep bringing it up.
00:11:34.680 | But we humans have such concepts,
00:11:38.180 | when we look at digits.
00:11:41.560 | But it's hard to put them, just like you're saying now,
00:11:43.900 | it's hard to put them into words.
00:11:46.000 | You know, that is example.
00:11:47.760 | When critics in music, trying to describe music,
00:11:53.640 | they use predicate.
00:11:57.100 | And not too many predicate, but in different combination.
00:12:02.880 | But they have some special words for describing music.
00:12:07.880 | And the same should be for images.
00:12:12.720 | But maybe there are critics who understand
00:12:15.680 | essence of what this image is about.
00:12:19.560 | - Do you think there exists critics
00:12:23.640 | who can summarize the essence of images, human beings?
00:12:28.640 | - I hope so, yes.
00:12:31.900 | But--
00:12:32.740 | - Explicitly state them on paper.
00:12:34.960 | The fundamental question I'm asking is,
00:12:42.600 | do you think there exists a small set of predicates
00:12:46.240 | that will summarize images?
00:12:48.080 | It feels to our mind like it does,
00:12:51.240 | that the concept of what makes a two and a three and a four.
00:12:56.000 | - No, no, no, it's not on this level.
00:12:58.920 | What, it should not describe two, three, four.
00:13:04.960 | It describes some construction
00:13:07.680 | which allow you to create invariance.
00:13:11.960 | - An invariance, sorry to stick on this, but terminology.
00:13:16.160 | - Invariance, it is property of your image.
00:13:21.160 | I can say, looking at my image, it is more or less symmetric.
00:13:31.640 | And I can give you a value of symmetry.
00:13:34.160 | Say, level of symmetry, using this function
00:13:39.400 | which I gave yesterday.
00:13:42.240 | And you can describe that your image
00:13:47.240 | has these characteristics,
00:13:51.600 | exactly in the way how musical critics describe music.
00:13:56.600 | So, but this is invariant applied to specific data,
00:14:02.680 | to specific music, to something.
00:14:07.720 | I strongly believe in this plot ideas
00:14:12.440 | that there exists world of predicate
00:14:15.000 | and world of reality and predicate and reality
00:14:18.920 | is somehow connected and you have to know that.
00:14:22.480 | - Let's talk about Plato a little bit.
00:14:24.020 | So you draw a line from Plato to Hegel to Wigner to today.
00:14:29.020 | - Yes.
00:14:30.280 | - So Plato has forms, the theory of forms.
00:14:35.560 | There's a world of ideas and a world of things
00:14:38.600 | as you talk about and there's a connection.
00:14:40.440 | And presumably the world of ideas is very small
00:14:44.720 | and the world of things is arbitrarily big.
00:14:48.060 | But they're all, what Plato calls them,
00:14:49.840 | like it's a shadow, the real world is a shadow
00:14:54.040 | from the world of forms.
00:14:54.880 | - Yeah, you have projection.
00:14:56.840 | - Projection.
00:14:57.680 | - Of world of idea.
00:14:59.240 | - Yeah, very poetic.
00:15:00.720 | - In reality you can realize this projection
00:15:04.840 | using these invariants because it is projection
00:15:09.320 | for on specific examples which creates specific features
00:15:13.680 | of specific objects.
00:15:15.120 | - So the essence of intelligence is while only being able
00:15:22.400 | to observe the world of things,
00:15:24.720 | try to come up with a world of ideas.
00:15:27.040 | - Exactly, like in this music story.
00:15:30.000 | Intelligent musical critics knows all this world
00:15:33.160 | and have a feeling about what--
00:15:34.800 | - I feel like that's a contradiction,
00:15:36.360 | intelligent music critics.
00:15:39.160 | I think music is to be
00:15:42.080 | enjoyed in all its forms.
00:15:47.720 | The notion of critic like a food critic.
00:15:50.040 | - No, I don't want that emotion.
00:15:52.360 | - That's an interesting question.
00:15:53.760 | Does emotion, there's certain elements
00:15:56.720 | of the human psychology, of the human experience
00:16:00.200 | which seem to almost contradict intelligence and reason.
00:16:05.200 | Like emotion, like fear, like love, all of those things.
00:16:10.520 | Are those not connected in any way to the space of ideas?
00:16:16.440 | - That I don't know.
00:16:18.760 | I just want to be concentrate on very simple story,
00:16:25.280 | on digit recognition.
00:16:27.960 | - So you don't think you have to love and fear death
00:16:30.480 | in order to recognize digits?
00:16:32.840 | - I don't know.
00:16:34.560 | Because it's so complicated.
00:16:37.000 | It involves a lot of stuff which I never consider.
00:16:41.600 | But I know about digit recognition.
00:16:44.280 | And I know that for digit recognition
00:16:49.320 | to get the records from small number of observations,
00:16:57.360 | you need predicate.
00:16:59.400 | But not special predicate for this problem.
00:17:02.760 | But universal predicate which understand world of images.
00:17:08.600 | - Of visual information.
00:17:09.760 | - Visual, yeah.
00:17:11.400 | But on the first step, they understand, say,
00:17:15.880 | world of handwritten digits or characters
00:17:19.800 | or something simple.
00:17:21.640 | - So like you said, symmetry is an interesting one.
00:17:23.960 | - That's what I think one of the predicates
00:17:27.560 | related to symmetry.
00:17:29.400 | The level of symmetry.
00:17:30.960 | - Okay, degree of symmetry.
00:17:32.160 | So you think symmetry at the bottom is a universal notion
00:17:37.160 | and there's degrees of a single kind of symmetry
00:17:41.520 | or is there many kinds of symmetries?
00:17:44.200 | - Many kinds of symmetries.
00:17:46.040 | There is a symmetry, anti-symmetry, say, letter S.
00:17:52.400 | So it has vertical anti-symmetry.
00:17:56.340 | And it could be diagonal symmetry, vertical symmetry.
00:18:02.680 | - So when you cut vertically the letter S.
00:18:07.680 | - Yeah, then the upper part and lower part
00:18:12.800 | in different directions.
00:18:15.720 | - Yeah, inverted along the Y axis.
00:18:18.960 | But that's just like one example of symmetry, right?
00:18:21.280 | Isn't there like--
00:18:22.120 | - Right, but there is a degree of symmetry.
00:18:26.360 | If you play all this derivative stuff
00:18:29.160 | to do tangent distance,
00:18:34.160 | whatever I describe, you can have a degree of symmetry.
00:18:40.040 | And that is what describing reason of image.
00:18:45.480 | It is the same as you will describe this image.
00:18:51.920 | Same about digits, it has anti-symmetry.
00:18:56.920 | Digits three is symmetric, more or less look for symmetry.
00:19:02.920 | - Do you think such concepts like symmetry,
00:19:07.840 | predicates like symmetry,
00:19:09.840 | is it a hierarchical set of concepts?
00:19:14.320 | Or are these independent, distinct predicates
00:19:20.080 | that we want to discover, some set of?
00:19:23.640 | - No, there is a deal of symmetry.
00:19:26.000 | And you can, this idea of symmetry,
00:19:29.200 | make very general, like degree of symmetry.
00:19:35.240 | A degree of symmetry can be zero, no symmetry at all.
00:19:40.720 | Or degree of symmetry, say, more or less symmetrical.
00:19:47.000 | But you have one of these descriptions.
00:19:50.520 | And symmetry can be different.
00:19:52.520 | As I told, horizontal, vertical, diagonal,
00:19:56.360 | and anti-symmetry is also a concept of symmetry.
00:20:01.360 | - What about shape in general?
00:20:03.320 | I mean, symmetry is a fascinating notion, but--
00:20:05.840 | - No, no, I'm talking about digits.
00:20:08.640 | I would like to concentrate on all,
00:20:11.400 | I would like to know predicates for digit recognition.
00:20:14.520 | - Yes, but symmetry is not enough
00:20:17.000 | for digit recognition, right?
00:20:19.400 | - It is not necessarily for digit recognition.
00:20:22.560 | It helps to create invariant,
00:20:26.800 | which you can use when you will have examples
00:20:31.800 | for digit recognition.
00:20:35.040 | You have regular problem of digit recognition,
00:20:38.320 | you have examples of the first class, or second class.
00:20:41.640 | Plus, you know that there exists concept of symmetry.
00:20:45.880 | And you apply when you're looking for decision rule,
00:20:50.440 | you will apply concept of symmetry
00:20:55.440 | of this level of symmetry, which you estimate from.
00:21:00.160 | So let's talk, everything comes from weak convergence.
00:21:05.160 | - What is convergence, what is weak convergence,
00:21:09.280 | what is strong convergence?
00:21:11.440 | I'm sorry, I'm gonna do this to you.
00:21:13.400 | What are we converging from and to?
00:21:15.320 | - You're converging, you would like to have a function.
00:21:20.520 | The function which, say indicator function,
00:21:23.640 | which indicate your digit five, for example.
00:21:28.640 | - A classification task?
00:21:31.520 | - Let's talk only about classification.
00:21:33.800 | - So classification means you will say
00:21:36.880 | whether this is a five or not,
00:21:38.640 | or say which of the 10 digits it is.
00:21:40.680 | - Right, right.
00:21:42.160 | I would like to have these functions.
00:21:45.600 | Then, I have some examples.
00:21:51.600 | I can consider property of these examples.
00:22:00.240 | Say symmetry, and I can measure level of symmetry
00:22:04.880 | for every digit.
00:22:08.060 | And then I can take average from my training data,
00:22:13.060 | and I will consider only functions
00:22:19.660 | of conditional probability,
00:22:24.020 | which I'm looking for my decision rule.
00:22:26.340 | Which applying to digits,
00:22:36.460 | will give me the same average as I observe on training data.
00:22:40.780 | So actually, this is different level
00:22:45.380 | of description of what you want.
00:22:48.500 | You want not just, you show not one digit.
00:22:53.500 | You show this predicate, show general property
00:22:58.860 | of all digits which you have in mind.
00:23:03.740 | If you have in mind digit three,
00:23:06.080 | it gives you property of digit three,
00:23:10.380 | and you select as admissible set of function,
00:23:13.580 | only function, which keeps this property.
00:23:16.980 | You will not consider other functions.
00:23:20.760 | So you're immediately looking for smaller subset of function.
00:23:24.940 | - That's what you mean by admissible functions.
00:23:27.140 | - Admissible function, exactly.
00:23:28.420 | - Which is still a pretty large,
00:23:30.940 | for the number three, it's a large--
00:23:32.780 | - It's pretty large, but if you have one predicate.
00:23:36.600 | But according to, there is a strong and weak convergence.
00:23:41.600 | Strong convergence is convergence in function.
00:23:45.280 | You're looking for the function, on one function,
00:23:49.240 | and you're looking for another function.
00:23:51.900 | And square difference from them should be small.
00:23:56.900 | If you take difference in any points,
00:24:01.880 | make a square, make an integral, and it should be small.
00:24:05.640 | That is convergence in function.
00:24:08.040 | Suppose you have some function, any function.
00:24:11.280 | So I would say, I say that some function
00:24:15.440 | converge to this function.
00:24:16.960 | If integral from square difference between them is small.
00:24:22.880 | - That's the definition of strong convergence.
00:24:24.800 | - That definition of strong convergence.
00:24:25.800 | - Two functions, the integral of the difference is small.
00:24:28.960 | - It is convergence in functions.
00:24:31.160 | - Yeah.
00:24:32.320 | - But you have different convergence in functionals.
00:24:36.760 | You take any function, you take some function, phi,
00:24:40.120 | and take inner product, this function is f function.
00:24:45.120 | F zero function, which you want to find.
00:24:50.360 | And that gives you some value.
00:24:52.000 | So you say that set of functions converge
00:25:00.080 | in inner product to this function,
00:25:03.080 | if this value of inner product converge to value f zero.
00:25:08.080 | That is for one phi.
00:25:12.520 | But weak convergence requires that it converge
00:25:15.680 | for any function of Hilbert space.
00:25:20.680 | If it converge for any function of Hilbert space,
00:25:24.280 | then you will say that this is weak convergence.
00:25:28.320 | You can think that when you take integral,
00:25:32.240 | that is property, integral property of function.
00:25:36.000 | For example, if you will take sine or cosine,
00:25:39.200 | it is coefficient of, say, Fourier expansion.
00:25:43.960 | So if it converge for all coefficients
00:25:50.560 | of Fourier expansion, so under some condition,
00:25:54.280 | it converge to function you're looking for.
00:25:58.160 | But weak convergence means any property.
00:26:01.240 | Convergence not point-wise,
00:26:05.880 | but integral property of function.
00:26:08.660 | So weak convergence means integral property of functions.
00:26:13.880 | When I'm talking about predicate,
00:26:16.120 | I would like to formulate which integral properties
00:26:21.120 | I would like to have for convergence.
00:26:27.920 | So, and if I will take one predicate,
00:26:32.920 | it's function which I measure property.
00:26:35.540 | If I will use one predicate and say,
00:26:40.640 | I will consider only function which give me
00:26:44.840 | the same value as this predicate,
00:26:47.960 | I selecting set of functions from functions
00:26:52.960 | which is admissible in the sense that function
00:26:57.680 | which I looking for in this set of functions.
00:27:01.080 | Because I checking in training data,
00:27:06.080 | it gives the same.
00:27:07.620 | - Yeah, so it always has to be connected
00:27:10.320 | to the training data in terms of--
00:27:12.640 | - Yeah, but property, you can know independent
00:27:17.400 | on training data.
00:27:18.800 | And this guy, prop.
00:27:21.280 | - Yeah.
00:27:22.120 | - So there is formal property.
00:27:24.040 | 31 property and--
00:27:25.480 | - Fairy tale, Russian fairy tale.
00:27:27.280 | But Russian fairy tale is not so interesting.
00:27:30.520 | More interesting is that people applied this
00:27:33.280 | to movies, to theater, to different things.
00:27:38.280 | The same works, they're universal.
00:27:42.000 | - Well, so I would argue that there's a little bit
00:27:44.800 | of a difference between the kinds of things
00:27:48.560 | that were applied to which are essentially stories
00:27:51.560 | and digit recognition.
00:27:53.420 | - It is the same story.
00:27:55.920 | You're saying digits, there's a story within the digit.
00:27:59.720 | - Yeah.
00:28:00.560 | But my point is why I hope that it possible
00:28:06.480 | to beat record using not 60,000,
00:28:11.480 | but say 100 times less.
00:28:13.860 | Because instead you will give predicates.
00:28:16.580 | And you will select your decision not from
00:28:22.120 | wide set of function, but from set of function
00:28:25.720 | which keeps its predicates.
00:28:28.080 | But predicates is not related just to digit recognition.
00:28:32.840 | - Right, so--
00:28:33.880 | - Like in Plotter's case.
00:28:35.520 | (laughing)
00:28:37.700 | - Do you think it's possible to automatically
00:28:40.240 | discover the predicates?
00:28:42.160 | So you basically said that the essence of intelligence
00:28:46.620 | is the discovery of good predicates.
00:28:49.680 | - Yeah.
00:28:50.500 | - Now the natural question is,
00:28:55.200 | you know, that's what Einstein was good at doing in physics.
00:28:58.200 | Can we make machines do these kinds of discovery
00:29:03.080 | of good predicates?
00:29:04.560 | Or is this ultimately a human endeavor?
00:29:06.840 | - That I don't know.
00:29:09.120 | I don't think that machine can do.
00:29:11.440 | Because according to theory about weak convergence,
00:29:16.440 | any function from Hilbert space can be predicated.
00:29:23.200 | So you have infinite number of predicate in upper,
00:29:27.680 | and before you don't know which predicate is good and which.
00:29:32.680 | But whatever probe show and why people call it breakthrough,
00:29:37.840 | that there is not too many predicate which cover
00:29:44.920 | most of situation happened in the world.
00:29:51.360 | - So there's a sea of predicates.
00:29:53.200 | And most of the, only a small amount are useful
00:29:57.800 | for the kinds of things that happen in the world.
00:30:00.240 | - I think that I would say only small part of predicate,
00:30:06.360 | very useful.
00:30:08.720 | Useful all of them.
00:30:11.360 | - Only very few are what we should,
00:30:13.720 | let's call them good predicates.
00:30:15.480 | - Very good predicates.
00:30:16.680 | - Very good predicates.
00:30:18.160 | So can we linger on it, what's your intuition,
00:30:21.800 | why is it hard for a machine to discover good predicates?
00:30:26.800 | - Even in my talk described how to do predicate.
00:30:30.760 | How to find new predicate.
00:30:32.720 | I'm not sure that it is very good.
00:30:35.000 | - What did you propose in your talk?
00:30:36.680 | - No, in my talk I gave example for diabetes.
00:30:41.680 | - Diabetes, yeah.
00:30:43.800 | - When we achieve some percent,
00:30:46.240 | so then we're looking for area
00:30:48.440 | where some sort of predicate, which I formulate,
00:30:53.760 | does not,
00:30:55.880 | keeps invariant.
00:31:01.440 | So if it doesn't keep, I retrain my data,
00:31:07.000 | I select only function which keeps it invariant.
00:31:11.160 | And when I did it, I improved my performance.
00:31:14.500 | I can look for this predicate.
00:31:16.520 | I know technically how to do that.
00:31:19.560 | And you can, of course, do it using machine.
00:31:24.560 | But I'm not sure that we will construct
00:31:28.880 | the smartest predicate.
00:31:31.000 | - But this is the, allow me to linger on it,
00:31:34.200 | because that's the essence, that's the challenge,
00:31:36.320 | that is artificial, that's the human level intelligence
00:31:40.360 | that we seek, is the discovery of these good predicates.
00:31:43.840 | You've talked about deep learning as a way to,
00:31:47.520 | the predicates they use and the functions are mediocre.
00:31:52.520 | We can find better ones.
00:31:55.080 | - Let's talk about deep learning.
00:31:57.360 | - Sure, let's do it.
00:31:58.200 | - I know only Janss-Likun, convolutional network.
00:32:03.200 | And what else?
00:32:05.280 | I don't know, and it's a very simple convolution.
00:32:07.960 | - There's not much else to know.
00:32:08.800 | - It's left and right.
00:32:10.480 | I can do it like that, with one predicate.
00:32:14.120 | It is--
00:32:14.960 | - Convolution is a single predicate.
00:32:16.640 | - It's single, it's single predicate.
00:32:21.200 | - Yes, but--
00:32:22.040 | - You know exactly, you take the derivative
00:32:25.480 | for translation and predicate, this should be kept.
00:32:29.940 | - So that's a single predicate,
00:32:32.480 | but humans discovered that one, or at least--
00:32:35.040 | - That is a risk, not too many predicates.
00:32:39.040 | And that is big story because Jan did it 25 years ago
00:32:43.760 | and nothing so clear was added to deep network.
00:32:48.760 | And then I don't understand
00:32:53.280 | why we should talk about deep network
00:32:57.440 | instead of talking about piecewise linear functions
00:33:01.280 | which keeps this predicate.
00:33:02.880 | - Well, a counter argument is
00:33:07.320 | that maybe the amount of predicates necessary
00:33:11.160 | to solve general intelligence, say in the space of images,
00:33:16.160 | doing efficient recognition of handwritten digits
00:33:20.600 | is very small.
00:33:22.360 | And so we shouldn't be so obsessed about finding,
00:33:26.000 | we'll find other good predicates
00:33:28.960 | like convolution, for example.
00:33:30.720 | You know, there has been other advancements
00:33:33.880 | like if you look at the work with attention,
00:33:37.400 | there's attentional mechanisms,
00:33:39.480 | and especially used in natural language,
00:33:42.160 | focusing the network's ability
00:33:44.200 | to learn at which part of the input to look at.
00:33:47.640 | The thing is, there's other things besides predicates
00:33:51.040 | that are important for the actual engineering mechanism
00:33:55.280 | of showing how much you can really do
00:33:58.080 | given such these predicates.
00:34:02.120 | - I mean, that's essentially the work of deep learning
00:34:04.360 | is constructing architectures
00:34:07.160 | that are able to be, given the training data,
00:34:11.400 | to be able to converge towards
00:34:14.720 | a function that can approximate, can generalize well.
00:34:21.340 | It's an engineering problem.
00:34:24.400 | - Yeah, I understand.
00:34:26.000 | But let's talk not on emotional level
00:34:29.880 | but on a mathematical level.
00:34:31.880 | You have set of piecewise linear functions.
00:34:36.440 | It is all possible neural networks.
00:34:40.140 | It's just piecewise linear functions.
00:34:44.040 | There's many, many pieces.
00:34:45.360 | - Large, large number of piecewise linear functions.
00:34:47.640 | - Exactly, but-- - Very large.
00:34:49.440 | - Very large. - Almost feels like
00:34:51.280 | too large. - But it's still simpler
00:34:53.280 | than say convolution,
00:34:56.160 | than reproducing kernel Hilbert space
00:34:58.840 | which have a Hilbert set of functions.
00:35:00.880 | - What's Hilbert space?
00:35:03.000 | - It's space with infinite number of coordinates,
00:35:07.160 | say, or function for expansion, something like that.
00:35:11.820 | So it's much richer.
00:35:13.460 | So when I'm talking about closed form solution,
00:35:17.520 | I'm talking about this set of function,
00:35:20.840 | not piecewise linear set which is particular case.
00:35:29.560 | It is small part--
00:35:30.920 | - So neural networks is a small part of the space
00:35:33.600 | you're talking about, of functions you're talking about.
00:35:35.960 | - Say, small set of functions.
00:35:39.080 | Let me take that.
00:35:40.640 | But it is fine, it is fine.
00:35:42.760 | I don't want to discuss the small or big,
00:35:46.600 | you take advantage.
00:35:47.960 | So you have some set of functions.
00:35:50.080 | So now when you're trying to create architecture,
00:35:54.360 | you would like to create admissible set of functions
00:35:59.120 | all your tricks to use not all functions,
00:36:03.360 | but some subset of this set of functions.
00:36:06.000 | Say, when you're introducing convolutional net,
00:36:10.140 | it is way to make this subset useful for you.
00:36:15.140 | But from my point of view, convolutional,
00:36:19.800 | it is something you want to keep some invariants,
00:36:24.800 | say translation invariants.
00:36:27.980 | But now if you understand this,
00:36:31.840 | and you cannot explain on the level of ideas
00:36:36.840 | what neural network does,
00:36:39.740 | you should agree that it is much better
00:36:44.400 | to have a set of functions.
00:36:46.720 | As I say, this set of functions should be admissible,
00:36:51.140 | it must keep this invariant, this invariant,
00:36:53.640 | and that invariant.
00:36:55.260 | You know that as soon as you incorporate new invariants,
00:36:59.080 | set of function becomes smaller and smaller and smaller.
00:37:02.160 | - But all the invariants are specified by you, the human.
00:37:05.540 | - Yeah, but what I hope that there is a standard predicate,
00:37:11.740 | like probe show,
00:37:14.200 | that what I want to find for digital recognition.
00:37:19.640 | If we start, it is completely new area,
00:37:22.960 | what is intelligence about on the level
00:37:25.840 | starting from Plata's idea,
00:37:28.640 | what is world of ideas.
00:37:30.900 | So, and I believe that it's not too many.
00:37:34.780 | But you know, it is amusing that mathematician
00:37:39.800 | doing something in neural network, in general function,
00:37:44.040 | but people from literature, from art,
00:37:47.600 | they use this all the time.
00:37:49.480 | - That's right.
00:37:50.320 | - New invariants saying, say,
00:37:53.800 | it is great how people describe music,
00:37:57.040 | we should learn from that.
00:37:58.800 | And something on this level,
00:38:02.080 | but so why Vladimir Probe,
00:38:04.960 | who was just theoretical,
00:38:08.200 | who studied theoretical literature, he found that.
00:38:12.280 | - You know what, let me throw that right back at you,
00:38:15.240 | because there's a little bit of a,
00:38:17.360 | that's less mathematical and more emotional,
00:38:20.080 | philosophical, Vladimir Probe.
00:38:22.760 | I mean, he wasn't doing math.
00:38:25.000 | - No.
00:38:25.840 | - And you just said another emotional statement,
00:38:30.120 | which is you believe that this Plato world of ideas
00:38:34.000 | is small.
00:38:34.880 | - I hope.
00:38:37.040 | - I hope.
00:38:37.880 | What's your intuition, though, if we can linger on it?
00:38:43.600 | - You know, it is not just small or big.
00:38:48.640 | I know exactly, then when I introducing
00:38:53.000 | some predicate, I decrease set of functions.
00:38:58.920 | But my goal to decrease set of function much.
00:39:02.940 | - By as much as possible.
00:39:05.040 | - By as much as possible.
00:39:06.520 | Good predicate, which does this.
00:39:11.120 | Then I should choose next predicate,
00:39:13.360 | which decreases as much as possible.
00:39:17.280 | So set of good predicate,
00:39:19.440 | it is such that they decrease
00:39:23.040 | this amount of admissible function.
00:39:27.840 | - So if each good predicate significantly reduces
00:39:31.060 | the set of admissible functions,
00:39:32.640 | that there naturally should not be that many good predicates.
00:39:35.600 | - No, but if you reduce very well the VC dimension
00:39:40.600 | of the function, of admissible set of function,
00:39:45.560 | it's small, and you need not too much
00:39:49.160 | training data to do well.
00:39:51.260 | - And VC dimension, by the way,
00:39:55.360 | is some measure of capacity of this set of functions.
00:39:57.760 | - Right.
00:39:58.600 | Roughly speaking, how many function in this set.
00:40:02.000 | So you're decreasing, decreasing,
00:40:04.000 | and it makes easy for you to find
00:40:07.720 | function you're looking for.
00:40:09.120 | But the most important part to create
00:40:13.400 | good admissible set of functions.
00:40:15.800 | And it probably, there are many ways,
00:40:18.880 | but the good predicate, it's such that can do that.
00:40:23.880 | So for this duck, you should know a little bit about duck,
00:40:30.560 | because--
00:40:31.600 | - What are the three fundamental laws of ducks?
00:40:35.360 | - Looks like a duck, swims like a duck,
00:40:37.440 | and quacks like a duck.
00:40:38.400 | - You should know something about ducks to be able to--
00:40:41.200 | - Not necessarily.
00:40:42.560 | Looks like, say, horse.
00:40:45.000 | It's also good.
00:40:45.840 | - It generalizes from ducks.
00:40:50.000 | - And talk like, and make sound like horse, or something.
00:40:54.400 | And run like horse, and moves like horse.
00:40:57.380 | It is general.
00:40:58.520 | It is general predicate that this applied to duck.
00:41:04.640 | But for duck, you can say, play chess like duck.
00:41:09.900 | - You cannot say, play chess like duck.
00:41:11.620 | - Why not?
00:41:12.680 | - So you're saying you can, but that would not be a good--
00:41:15.800 | - No, you will not reduce a lot of functions.
00:41:18.240 | - You would not do, yeah, you would not reduce
00:41:20.200 | the set of functions.
00:41:21.700 | - So you can, the story is, formal story,
00:41:25.140 | mathematical story, is that you can use any function
00:41:28.840 | you want as a predicate.
00:41:30.340 | But some of them are good, some of them are not,
00:41:33.200 | because some of them reduce a lot of functions
00:41:36.120 | to admissible set.
00:41:38.040 | Some of them--
00:41:39.760 | - But the question is, and I'll probably keep asking
00:41:42.120 | this question, but how do we find such,
00:41:45.680 | what's your intuition?
00:41:47.400 | Handwritten recognition, how do we find
00:41:51.080 | the answer to your challenge?
00:41:52.680 | - Yeah, I understand it like that.
00:41:55.980 | I understand what--
00:41:57.920 | - What defined?
00:41:59.240 | - What it means, a new predicate.
00:42:01.480 | Like, guy who understand music can say this word
00:42:06.240 | which he described when he listened to music.
00:42:09.640 | He understand music.
00:42:11.720 | He use not too many different, or you can do like prop.
00:42:15.600 | You can make collection what he talking about music,
00:42:19.320 | about this, about that.
00:42:21.040 | It's not too many different situation he described.
00:42:25.040 | - Because we mentioned Vladimir Prop a bunch,
00:42:26.960 | let me just mention, there's a sequence of 31
00:42:30.200 | structural notions that are common in stories,
00:42:36.920 | and I think--
00:42:37.760 | - He called units.
00:42:38.600 | - Units, and I think they resonate.
00:42:40.480 | I mean, it starts, just to give an example,
00:42:43.600 | absention, a member of the hero's community or family
00:42:46.520 | leaves the security of the home environment,
00:42:48.920 | then it goes to the interdiction,
00:42:51.040 | a forbidding edict or command is passed upon the hero,
00:42:54.520 | don't go there, don't do this.
00:42:56.620 | The hero's warned against some action.
00:42:58.680 | Then, step three, violation of interdiction.
00:43:03.680 | Break the rules, break out on your own.
00:43:07.580 | Then, reconnaissance, the villain makes an effort
00:43:10.400 | to attain knowledge, needing to fulfill their plot,
00:43:12.760 | so on, it goes on like this,
00:43:14.240 | ends in a wedding, number 31, happily ever after.
00:43:19.240 | - No, he just gave description of all situation.
00:43:25.640 | He understands this world.
00:43:28.160 | - Of folk tales.
00:43:29.280 | - Yeah, not folk, but stories.
00:43:33.160 | And this story's not in just folk tales.
00:43:36.560 | The story's in detective serials as well.
00:43:39.960 | - And probably in our lives, we probably live--
00:43:43.760 | - Read this.
00:43:45.080 | At the end, they wrote that this predicate is good
00:43:50.080 | for different situation, for movie, for theater.
00:43:56.440 | - By the way, there's also criticism, right?
00:44:00.640 | There's another way to interpret narratives
00:44:03.840 | from Claude Lévi-Strauss.
00:44:08.840 | - I don't know.
00:44:10.920 | I am not in this business.
00:44:12.640 | - No, I know, it's theoretical literature,
00:44:14.400 | but it's looking at paradise behind the scenes.
00:44:17.240 | - It's always the--
00:44:18.240 | - Philosophers argue. - Discussion, yeah.
00:44:20.160 | But at least there is units.
00:44:23.800 | It's not too many units that can describe,
00:44:27.200 | but this guy probably gives another units,
00:44:30.880 | or another way of--
00:44:31.760 | - Exactly, another set of units.
00:44:34.440 | - Another set of predicates.
00:44:35.960 | Doesn't matter how, but they exist, probably.
00:44:40.960 | - My question is whether given those units,
00:44:46.240 | whether without our human brains to interpret these units,
00:44:50.360 | they would still hold as much power as they have.
00:44:53.480 | Meaning, are those units enough
00:44:56.220 | when we give them to an alien species?
00:44:58.880 | - Let me ask you.
00:45:00.320 | Do you understand digit images?
00:45:05.320 | - No, I don't understand.
00:45:07.720 | - No, no, no.
00:45:08.640 | When you can recognize these digit images,
00:45:11.000 | it means that you understand.
00:45:12.480 | - Yes, I understand.
00:45:14.200 | - You understand characters, you understand--
00:45:17.280 | - No, no, no, no.
00:45:18.920 | It's the imitation versus understanding question,
00:45:25.480 | because I don't understand the mechanism
00:45:28.360 | by which I understand.
00:45:29.200 | - No, no, I'm not talking about,
00:45:30.480 | I'm talking about predicates.
00:45:32.800 | You understand that it involves symmetry,
00:45:35.160 | maybe structure, maybe something.
00:45:37.440 | I cannot formulate.
00:45:38.720 | I just was able to find symmetries,
00:45:41.840 | so degree of symmetries.
00:45:43.680 | - That's really good.
00:45:44.520 | So this is a good line.
00:45:46.400 | I feel like I understand the basic elements
00:45:50.600 | of what makes a good hand recognition system my own.
00:45:54.320 | Like symmetry connects with me.
00:45:56.440 | It seems like that's a very powerful predicate.
00:45:59.160 | My question is, is there a lot more going on
00:46:02.400 | that we're not able to introspect?
00:46:04.500 | Maybe I need to be able to understand
00:46:09.640 | a huge amount in the world of ideas,
00:46:13.080 | thousands of predicates, millions of predicates,
00:46:18.440 | in order to do hand recognition.
00:46:20.600 | - I don't think so.
00:46:21.560 | - So you're--
00:46:24.840 | - Both your hope and your intuition
00:46:26.560 | are such that-- - No, let me explain.
00:46:28.960 | You're using digits, you're using examples as well.
00:46:33.500 | Theory says that if you will use
00:46:37.640 | all possible functions
00:46:42.480 | from Hilbert space, all possible predicate,
00:46:46.320 | you don't need training data.
00:46:47.960 | You just will have admissible set of functions
00:46:53.800 | which contain one function.
00:46:55.200 | - Yes.
00:46:57.120 | So the trade-off is when you're not using all predicates,
00:47:01.120 | you're only using a few good predicates,
00:47:02.960 | you need to have some training data.
00:47:05.000 | - Yes, exactly.
00:47:06.760 | - The more good predicates you have,
00:47:08.440 | the less training data you need.
00:47:09.680 | - Exactly.
00:47:10.960 | That is intelligent learning.
00:47:13.280 | - Still, okay.
00:47:14.720 | I'm gonna keep asking the same dumb question,
00:47:17.400 | handwritten recognition.
00:47:19.120 | To solve the challenge, you kind of propose a challenge
00:47:21.560 | that says we should be able to get state-of-the-art
00:47:24.640 | MNIST error rates by using very few,
00:47:28.800 | 60, maybe fewer examples per digit.
00:47:31.520 | What kind of predicates do you think you'll--
00:47:35.720 | - That is the challenge.
00:47:37.560 | So people who will solve this problem--
00:47:39.840 | - They will answer.
00:47:40.680 | - They will answer.
00:47:41.520 | - Do you think they'll be able to answer it
00:47:44.780 | in a human explainable way?
00:47:46.580 | - They just need to write function, that's it.
00:47:50.820 | - But, so can that function be written, I guess,
00:47:54.320 | by an automated reasoning system?
00:47:58.740 | Whether we're talking about a neural network
00:48:01.120 | learning a particular function, or another mechanism?
00:48:05.080 | - No, I'm not against neural network.
00:48:08.560 | I'm against admissible set of function
00:48:11.600 | which create neural network.
00:48:13.720 | You did it by hand.
00:48:15.240 | You don't do it by invariance,
00:48:19.880 | by predicate, by reason.
00:48:23.360 | - But neural networks can then reverse,
00:48:26.380 | do the reverse step of helping you find a function.
00:48:29.940 | Just, the task of a neural network
00:48:33.600 | is to find a disentangled representation, for example,
00:48:38.180 | that they call, is to find that one predicate function
00:48:42.100 | that really captures some kind of essence.
00:48:45.180 | One, not the entire essence,
00:48:46.860 | but one very useful essence of this particular visual space.
00:48:51.860 | Do you think that's possible?
00:48:54.060 | Listen, I'm grasping, hoping there's an automated way
00:48:58.620 | to find good predicates.
00:49:00.300 | So the question is, what are the mechanisms
00:49:02.980 | of finding good predicates, ideas,
00:49:05.740 | that you think we should pursue?
00:49:08.020 | A young grad student listening right now.
00:49:10.020 | - I gave example.
00:49:13.420 | So find situation where predicate,
00:49:18.420 | which you're suggesting, don't create invariant.
00:49:25.000 | It's like in physics.
00:49:28.820 | Find situation where existing theory cannot explain it.
00:49:33.820 | - Find situation where the existing theory
00:49:39.380 | can't explain it. - Theory cannot explain
00:49:40.700 | this situation. - So you're finding
00:49:41.580 | contradictions.
00:49:42.780 | Find contradiction, and then remove this contradiction.
00:49:46.140 | But in my case, what means contradiction,
00:49:48.940 | you find function, which, if you will use this function,
00:49:53.500 | you're not keeping invariants.
00:49:55.060 | - So it's really the process of discovering contradictions.
00:50:01.300 | - Yeah.
00:50:02.140 | It is like in physics.
00:50:05.900 | Find situation where you have contradiction
00:50:09.820 | for one of the property.
00:50:12.960 | For one of the predicate.
00:50:15.520 | Then include this predicate, making invariants,
00:50:19.040 | and solve again this problem.
00:50:20.480 | Now you don't have contradiction.
00:50:22.120 | But it is not the best way, probably, I don't know,
00:50:28.320 | to looking for predicate.
00:50:32.000 | - It's just one way, okay.
00:50:33.600 | - That, no, no, it is brute force way.
00:50:35.920 | - The brute force way.
00:50:37.320 | What about the ideas of, what,
00:50:42.280 | big umbrella term of symbolic AI.
00:50:44.680 | There's what, in the '80s, with expert systems,
00:50:48.520 | sort of logic, reasoning-based systems.
00:50:51.400 | Is there hope there to find some,
00:50:55.720 | through sort of deductive reasoning,
00:51:00.480 | to find good predicates?
00:51:04.440 | - I don't think so.
00:51:07.680 | I think that just logic is not enough.
00:51:12.000 | - It's kind of a compelling notion, though.
00:51:14.400 | You know, that when smart people sit in a room
00:51:17.600 | and reason through things, it seems compelling.
00:51:20.360 | And making our machines do the same is also compelling.
00:51:23.540 | - So everything is very simple.
00:51:27.820 | When you have infinite number of predicate,
00:51:34.080 | you can choose the function you want.
00:51:38.600 | You have invariants, and you can choose the function you want.
00:51:42.540 | But you have to have not too many invariants
00:51:47.540 | to solve the problem.
00:51:53.000 | So, and how from infinite number of function,
00:51:59.940 | to select finite number, and hopefully small number
00:52:04.940 | of functions, which is good enough
00:52:11.080 | to extract small set of admissible functions.
00:52:16.680 | So they will be admissible, it's for sure,
00:52:19.800 | because every function just decrease set of function
00:52:23.880 | and leaving it admissible.
00:52:25.680 | But it will be small.
00:52:27.720 | - But why do you think logic-based systems can't help?
00:52:32.720 | Intuition, not--
00:52:35.280 | - Because you should know reality.
00:52:37.800 | You should know life.
00:52:39.480 | This guy like Propp, he knows something.
00:52:44.280 | And he tried to put in invariant his understanding.
00:52:49.280 | - But that's the human, yeah, but see,
00:52:51.560 | you're putting too much value into
00:52:54.460 | Vladimir Propp's knowing something.
00:52:57.900 | - No, it is--
00:52:59.460 | - I'm minding the subject.
00:53:01.100 | - What means you know life?
00:53:02.900 | What it means?
00:53:05.380 | - You know common sense.
00:53:07.020 | - No, no.
00:53:08.380 | You know something.
00:53:10.380 | Common sense, it is some rules.
00:53:13.420 | - You think so?
00:53:14.820 | Common sense is simply rules?
00:53:17.180 | Common sense is everything, it's mortality,
00:53:21.820 | it's fear of death, it's love, it's spirituality,
00:53:26.820 | it's happiness and sadness.
00:53:30.820 | All of it is tied up into understanding gravity,
00:53:34.420 | which is what we think of as common sense.
00:53:36.860 | - I don't really to discuss so wide.
00:53:39.820 | I want to discuss, understand,
00:53:42.400 | digital recognition.
00:53:45.420 | - Any time I bring up love and death,
00:53:47.660 | you bring it back to digital recognition.
00:53:50.460 | - Yeah, no, you know, it is durable
00:53:52.980 | because there is a challenge,
00:53:54.900 | which I see how to solve it.
00:53:59.260 | If I will have a student concentrate on this work,
00:54:02.500 | I will suggest something to solve.
00:54:04.780 | - You mean handwritten recognition?
00:54:06.860 | Yeah, it's a beautifully simple, elegant, and yet--
00:54:10.780 | - I think that I know invariants which will solve this.
00:54:13.440 | - You do?
00:54:14.280 | - I think so, yes.
00:54:15.940 | But it is not universal, it is maybe,
00:54:20.940 | I want some universal invariants which are good
00:54:25.060 | not only for digital recognition, for image understanding.
00:54:28.540 | - So let me ask, how hard do you think
00:54:34.180 | is 2D image understanding?
00:54:37.100 | So if we can kind of intuit handwritten recognition,
00:54:43.820 | how big of a step, leap, journey is it from that?
00:54:48.820 | If I gave you good, if I solved your challenge
00:54:51.980 | for handwritten recognition,
00:54:53.620 | how long would my journey then be from that
00:54:56.520 | to understanding more general natural images?
00:54:59.380 | - Immediately, you will understand this
00:55:01.940 | as soon as you will make a record.
00:55:04.060 | Because it is not for free.
00:55:07.740 | As soon as you will create several invariants
00:55:13.020 | which will help you to get the same performance
00:55:18.020 | that the best neural net did,
00:55:22.820 | using 100 times, maybe more than 100 times less examples,
00:55:27.820 | you have to have something smart to do that.
00:55:31.260 | - And you're saying--
00:55:32.260 | - That is invariant, it is predicate.
00:55:35.220 | Because you should put some idea how to do that.
00:55:39.460 | But okay, let me just pause.
00:55:42.380 | Maybe it's a trivial point, maybe not.
00:55:44.500 | But handwritten recognition feels like a 2D,
00:55:48.820 | two-dimensional problem.
00:55:50.440 | And it seems like how much complicated is the fact
00:55:55.340 | that most images are a projection
00:55:58.020 | of a three-dimensional world onto a 2D plane.
00:56:03.020 | It feels like for a three-dimensional world,
00:56:05.900 | we need to start understanding common sense
00:56:08.660 | in order to understand an image.
00:56:10.920 | It's no longer visual shape and symmetry.
00:56:16.980 | It's having to start to understand concepts,
00:56:20.740 | understand life.
00:56:22.100 | - Yeah.
00:56:22.940 | You're talking that there are different invariants.
00:56:27.300 | Different predicates, yeah.
00:56:28.900 | - And potentially much larger number.
00:56:32.500 | - You know, maybe.
00:56:34.340 | But let's start from simple.
00:56:36.340 | - Well, yeah, but you said that it would be--
00:56:38.020 | - But you know, I cannot think about things
00:56:41.420 | which I don't understand.
00:56:43.300 | This I understand.
00:56:44.820 | But I'm sure that I don't understand everything there.
00:56:48.460 | - Yeah, that's the difference.
00:56:49.300 | - It's like in staying, say, do as simple as possible,
00:56:53.140 | but not simpler.
00:56:54.380 | And that is exact case.
00:56:56.560 | - With handwritten recognition.
00:56:57.400 | - With handwritten.
00:56:58.980 | - Yeah, but never, that's the difference between you and I.
00:57:04.940 | I welcome and enjoy thinking about things
00:57:07.940 | I completely don't understand.
00:57:09.900 | Because to me, it's a natural extension
00:57:12.380 | without having solved handwritten recognition
00:57:15.140 | to wonder how difficult is the next step
00:57:20.140 | of understanding 2D, 3D images.
00:57:25.680 | Because ultimately, while the science of intelligence
00:57:29.260 | is fascinating, it's also fascinating to see
00:57:31.700 | how that maps to the engineering of intelligence.
00:57:34.700 | And recognizing handwritten digits is not,
00:57:39.340 | doesn't help you, it might, it may not help you
00:57:43.100 | with the problem of general intelligence.
00:57:46.560 | We don't know.
00:57:47.400 | It'll help you a little bit, we don't know how much.
00:57:49.500 | - It's unclear.
00:57:50.340 | - It's unclear.
00:57:51.160 | - Yeah.
00:57:52.000 | - It might very much.
00:57:52.840 | - But I would like to make a remark.
00:57:53.660 | - Yes.
00:57:54.500 | - I start not from very primitive problem,
00:57:58.780 | make a challenge problem.
00:58:03.100 | I start with very general problem, with Plato.
00:58:06.800 | So you understand, and it comes from Plato
00:58:10.660 | to digit recognition.
00:58:14.500 | - So you basically took Plato and the world of forms
00:58:19.140 | and ideas and mapped and projected into the clearest,
00:58:23.900 | simplest formulation of that big world.
00:58:26.820 | - You know, I would say that I did not understand Plato
00:58:31.540 | until recently, and until I consider weak convergence
00:58:36.540 | and then predicate and then, oh, this is what Plato taught.
00:58:43.380 | - So--
00:58:46.300 | - Can you linger on that?
00:58:47.120 | Like why, how do you think about this world of ideas
00:58:50.180 | and world of things in Plato?
00:58:51.980 | - No, it is metaphor, it is--
00:58:54.860 | - It's a metaphor for sure.
00:58:55.820 | - Yeah.
00:58:56.660 | - It's a poetic and a beautiful metaphor.
00:58:57.820 | - Yeah, yeah, yeah.
00:58:58.740 | - But what, can you--
00:59:00.540 | - But it is a way how you should try to understand
00:59:04.980 | how attack ideas in the world.
00:59:07.900 | So from my point of view, it is very clear,
00:59:12.900 | but it is lying.
00:59:14.900 | All the time people looking for that.
00:59:17.540 | Say, Plato's and Hegel, whatever reasonable it exists,
00:59:22.540 | whatever exists, it is reasonable.
00:59:26.700 | I don't know what he have in mind, reasonable.
00:59:30.240 | - Right, there's philosophers again.
00:59:31.580 | - No, no, no, no, no, no, no, no.
00:59:33.300 | It is next stop of Wigner, that mathematics
00:59:38.100 | understand something of reality.
00:59:40.740 | It is the same Plato line.
00:59:42.440 | And then it comes suddenly to Vladimir Propp.
00:59:47.100 | Look, 31 ideas, 31 units, and describes everything.
00:59:52.880 | - There's abstractions, ideas that represent our world.
01:00:00.160 | And we should always try to reach into that.
01:00:03.320 | - Yeah, but you should make a projection on reality.
01:00:07.520 | But understanding is, it is abstract ideas.
01:00:11.820 | You have in your mind several abstract ideas
01:00:15.880 | which you can apply to reality.
01:00:17.800 | - And reality in this case,
01:00:19.160 | so if you look at machine learning, is data.
01:00:21.400 | - It's example, data.
01:00:22.720 | - Data.
01:00:24.080 | Okay, let me put this on you,
01:00:26.280 | because I'm an emotional creature.
01:00:28.360 | I'm not a mathematical creature like you.
01:00:30.780 | I find compelling the idea,
01:00:33.400 | forget the space, the sea of functions.
01:00:36.660 | There's also a sea of data in the world.
01:00:39.520 | And I find compelling that there might be,
01:00:42.280 | like you said, teacher, small examples of data
01:00:47.280 | that are most useful for discovering good,
01:00:52.620 | whether it's predicates or good functions,
01:00:55.540 | that the selection of data may be a powerful journey,
01:01:00.300 | a useful, you know, coming up with a mechanism
01:01:03.740 | for selecting good data might be useful too.
01:01:06.460 | Do you find this idea of finding the right data set
01:01:12.420 | interesting at all?
01:01:13.960 | Or do you kind of take the data set as a given?
01:01:16.680 | - I think that it is, you know, my scheme is very simple.
01:01:22.620 | You have huge set of functions.
01:01:25.880 | If you will apply, and you have not too many data,
01:01:30.880 | if you pick up function which describes this data,
01:01:36.480 | you will do not very well.
01:01:39.940 | - Like randomly pick up?
01:01:42.240 | - Yeah, you will overfit, it will be overfitting.
01:01:45.440 | So you should decrease set of function
01:01:50.160 | from which you're picking up one.
01:01:53.660 | So you should go somehow to admissible set of function.
01:01:58.080 | And this, what about weak conversions?
01:02:02.360 | So but, from another point of view,
01:02:07.240 | to make admissible set of function,
01:02:13.200 | you need just a data, just function
01:02:15.320 | which you will take in inner product,
01:02:19.400 | which you will measure property of your function.
01:02:24.400 | And that is how it works.
01:02:31.180 | - No, I get it, I get it, I understand it,
01:02:32.740 | but do you, the reality is--
01:02:34.980 | - But let's think about examples.
01:02:39.140 | You have huge set of function,
01:02:41.860 | and you have several examples.
01:02:44.640 | If you just trying to take function
01:02:49.640 | which satisfies these examples,
01:02:52.580 | you still will overfit.
01:02:55.620 | You need decrease, you need admissible set of function.
01:02:59.220 | - Yeah, absolutely.
01:03:00.160 | But what, say you have more data than functions.
01:03:05.060 | So sort of consider the, I mean,
01:03:08.300 | maybe not more data than functions,
01:03:09.800 | 'cause that's-- - It's impossible.
01:03:11.300 | - Impossible.
01:03:12.140 | But what, I was trying to be poetic for a second.
01:03:15.160 | I mean, you have a huge amount of data,
01:03:17.200 | a huge amount of examples.
01:03:19.880 | - But amount of function can be even--
01:03:22.400 | - It can get bigger, I understand.
01:03:24.360 | - Everything can--
01:03:25.520 | - There's always a bigger boat.
01:03:27.560 | - Full Hilbert space.
01:03:29.280 | - I gotcha.
01:03:30.260 | But okay.
01:03:31.840 | But you don't find the world of data
01:03:35.840 | to be an interesting optimization space.
01:03:38.760 | Like the optimization should be in the space of functions.
01:03:42.280 | - In creating admissible set of function.
01:03:46.980 | - Admissible set of function.
01:03:48.140 | - No, you know, even from the classical basis theory,
01:03:52.440 | from structure risk minimization,
01:03:56.380 | you should organize function in the way
01:04:01.380 | that they will be useful for you.
01:04:06.540 | - Right.
01:04:07.540 | - And that is admissible set.
01:04:10.300 | - The way you're thinking about useful
01:04:12.620 | is you're given a small set of example.
01:04:16.940 | - Useful small.
01:04:17.820 | Small set of function which contain function by looking for.
01:04:21.820 | - Yeah, but looking for based on
01:04:25.300 | the empirical set of small examples.
01:04:27.620 | - Yeah, but that is another story, I don't touch it.
01:04:31.180 | Because I believe that this small examples
01:04:35.740 | is not too small.
01:04:37.380 | So 60 per class, law of large numbers works.
01:04:41.380 | I don't need uniform law.
01:04:43.380 | The story is that in statistics there are two law.
01:04:46.740 | Law of large numbers and uniform law of large numbers.
01:04:51.100 | So I want to be in situation where I use law
01:04:55.060 | of large numbers but not uniform law of large numbers.
01:04:58.260 | - Right, so 60 is law of large, it's large enough.
01:05:01.420 | - I hope, no, it still need some evaluation,
01:05:05.580 | some bounds, so it's, but the idea is the following.
01:05:10.060 | If you trust that, say, this average gives you
01:05:16.580 | something close to expectation,
01:05:21.020 | so you can talk about that, about this predicate.
01:05:26.020 | And that is basis of human intelligence.
01:05:29.800 | - Good predicates is the, the discovery of good predicates
01:05:33.740 | is the basis of human intelligence.
01:05:34.580 | - No, no, it is discovery of your understanding world.
01:05:39.580 | Of your methodology of understanding world.
01:05:43.560 | Because you have several function
01:05:47.260 | which you will apply to reality.
01:05:49.080 | - Can you say that again?
01:05:52.500 | So you're--
01:05:54.420 | - You have several functions, predicate.
01:05:57.560 | But they're abstract.
01:05:59.900 | Then you will apply them to reality, to your data.
01:06:04.340 | And you will create in this way predicate.
01:06:07.420 | Which is useful for your task.
01:06:09.660 | But predicate are not related specifically to your task,
01:06:16.420 | to this task, it is abstract functions.
01:06:20.100 | Which being applied to--
01:06:23.260 | - Many tasks that you might be interested in.
01:06:25.260 | - It might be many tasks, I don't know.
01:06:27.660 | - Well--
01:06:28.660 | - Different tasks.
01:06:29.940 | - Well they should be many tasks, right?
01:06:31.660 | - I believe like, like in probe case.
01:06:35.680 | It was for fairy tales, but it's happened everywhere.
01:06:38.540 | - Okay, so we talked about images a little bit,
01:06:42.180 | but can we talk about Noam Chomsky for a second?
01:06:45.780 | (laughing)
01:06:49.020 | - I believe I don't know him very well.
01:06:54.220 | - Personally, well--
01:06:55.660 | - Not personally, I don't know his ideas.
01:06:58.260 | - Well let me just say, do you think language,
01:07:01.020 | human language, is essential to expressing ideas,
01:07:05.780 | as Noam Chomsky believes?
01:07:08.340 | So like, language is at the core
01:07:10.140 | of our formation of predicates.
01:07:12.920 | It's like human language--
01:07:14.940 | - For me, language, and all the story of language,
01:07:18.580 | is very complicated.
01:07:20.740 | I don't understand this, and I'm not, I thought about--
01:07:25.740 | - Nobody does.
01:07:26.560 | - I'm not ready to work on that, because it's so huge.
01:07:30.780 | It is not for me, and I believe not for our century.
01:07:34.240 | - The 21st century.
01:07:37.340 | - Not for 21st century.
01:07:39.180 | - So--
01:07:40.020 | - We should learn something, a lot of stuff,
01:07:42.180 | from simple task, like digit recognition.
01:07:45.100 | - So you think, okay, you think digital recognition,
01:07:49.260 | 2D image, how would you more abstractly define
01:07:54.260 | digit recognition?
01:07:56.460 | It's 2D image, symbol recognition, essentially?
01:08:01.460 | I mean, I'm trying to get a sense,
01:08:08.100 | sort of thinking about it now,
01:08:09.700 | having worked with MNIST forever,
01:08:12.880 | how small of a subset is this,
01:08:16.020 | of the general vision recognition problem,
01:08:18.580 | and the general intelligence problem?
01:08:20.460 | Is it, yeah, is it a giant subset?
01:08:26.340 | Is it not?
01:08:27.820 | And how far away is language?
01:08:30.220 | - You know, let me refer to Einstein.
01:08:33.420 | Take the simplest problem, as simple as possible,
01:08:38.300 | but not simpler, and this is challenge,
01:08:41.780 | is simple problem, but it's simple by idea,
01:08:46.780 | but not simple to get it.
01:08:50.360 | When you will do this, you will find some predicate,
01:08:55.900 | which helps you to do it.
01:08:57.180 | - Well, yeah, I mean, with Einstein,
01:08:59.420 | you can, you look at general relativity,
01:09:04.140 | but that doesn't help you with quantum mechanics.
01:09:06.580 | - That's another story,
01:09:08.740 | you don't have any universal instrument.
01:09:11.840 | - Yeah, so I'm trying to wonder if,
01:09:15.380 | which space we're in, whether the,
01:09:17.540 | whether handwritten recognition is like general relativity,
01:09:21.140 | and then language is like quantum mechanics,
01:09:23.140 | so you're still gonna have to do a lot of mess
01:09:26.940 | to universalize it, but I'm trying to see,
01:09:31.940 | so what's your intuition why handwritten recognition
01:09:39.140 | is easier than language?
01:09:40.900 | Just, I think a lot of people would agree with that,
01:09:45.300 | but if you could elucidate sort of the intuition of why.
01:09:51.780 | - I don't, no, no, I don't think in this direction.
01:09:56.460 | I just think in the direction that this is problem,
01:09:59.560 | which if you will solve it well,
01:10:05.140 | we will create some abstract understanding of images.
01:10:12.740 | Maybe not all images.
01:10:19.700 | I would like to talk to guys who doing Unreal images
01:10:24.020 | in Columbia University.
01:10:26.260 | - What kind of images, Unreal?
01:10:28.420 | - Unreal images. - Real images.
01:10:29.820 | - Yeah, what their idea is,
01:10:32.340 | the real predicate, what can be predicate.
01:10:35.140 | I still, symmetry will play a role in real life images,
01:10:40.140 | in any real life images, 2D images,
01:10:43.900 | let's talk about 2D images.
01:10:46.320 | Because that's what we know.
01:10:51.320 | A neural network was created for 2D images.
01:10:55.940 | - So the people I know in vision science, for example,
01:10:58.660 | the people who study human vision,
01:11:01.000 | that they usually go to the world of symbols
01:11:04.500 | and like handwritten recognition,
01:11:06.360 | but not really, it's other kinds of symbols
01:11:08.460 | to study our visual perception system.
01:11:11.560 | As far as I know, not much predicate type of thinking
01:11:15.180 | is understood about our vision system.
01:11:17.620 | - They did not think in this direction.
01:11:19.420 | - They don't, yeah, but how do you even begin
01:11:21.740 | to think in that direction?
01:11:23.500 | - That's, I would like to discuss with them.
01:11:26.900 | - Yeah.
01:11:27.740 | - Because if we will be able to show that it is worth working
01:11:32.740 | and theoretical scheme, it's not so bad.
01:11:40.340 | - So the unfortunate, so if we compare to language,
01:11:43.340 | language is like letters, a finite set of letters
01:11:46.520 | and a finite set of ways you can put together those letters,
01:11:50.500 | so it feels more amenable to kind of analysis.
01:11:53.720 | With natural images, there is so many pixels.
01:11:58.680 | - No, no, no, letter, language is much,
01:12:02.020 | much more complicated.
01:12:03.660 | It's involved a lot of different stuff.
01:12:08.020 | It's not just understanding of very simple class of tasks.
01:12:14.020 | I would like to see lists of tasks with language involved.
01:12:19.020 | - Yes, so there's a lot of nice benchmarks now
01:12:23.220 | in natural language processing from the very trivial,
01:12:26.480 | like understanding the elements of a sentence
01:12:30.180 | to question answering to much more complicated
01:12:33.060 | where you talk about open domain dialogue.
01:12:36.100 | The natural question is with handwritten recognition,
01:12:39.240 | it's really the first step of understanding
01:12:42.960 | visual information.
01:12:44.600 | - Right, but even our records show that we go
01:12:49.600 | in the wrong direction because we need 60,000 digits.
01:12:56.580 | - So even this first step, so forget about talking
01:12:59.660 | about the full journey, this first step should be taken
01:13:02.580 | in the right direction.
01:13:03.420 | - No, no, in the wrong direction
01:13:04.540 | because 60,000 is unacceptable.
01:13:07.180 | - No, I'm saying it should be taken in the right direction
01:13:11.020 | because 60,000 is not acceptable.
01:13:13.660 | - If you can talk, it's great, we have half percent of error.
01:13:18.480 | - And hopefully the step from doing hand recognition
01:13:22.760 | using very few examples, the step towards what babies do
01:13:26.840 | when they crawl and understand their physical environment.
01:13:29.240 | - I don't know what babies do.
01:13:30.200 | - I know you don't know about babies.
01:13:31.760 | - If you will do from very small examples,
01:13:36.080 | you will find principles which are different
01:13:40.560 | from what we're using now.
01:13:43.080 | And theoretically it's more or less clear.
01:13:48.360 | That means that you will use weak convergence,
01:13:52.280 | not just strong convergence.
01:13:54.480 | - Do you think these principles will naturally
01:13:59.280 | be human interpretable?
01:14:01.680 | - Oh yeah.
01:14:02.560 | - So like when we'll be able to explain them
01:14:04.480 | and have a nice presentation to show
01:14:06.240 | what those principles are?
01:14:07.600 | Or are they going to be very kind of abstract
01:14:12.600 | kinds of functions?
01:14:14.440 | - For example, I talked yesterday about symmetry.
01:14:17.680 | - Yes.
01:14:18.720 | - And I gave very simple examples.
01:14:20.440 | The same will be like that.
01:14:22.040 | - You gave like a predicate of a basic for--
01:14:24.680 | - For symmetries.
01:14:25.760 | - Yes, for different symmetries and you have for--
01:14:29.560 | - Degree of symmetry, that is important, not just symmetry.
01:14:33.680 | Existence doesn't exist, degree of symmetry.
01:14:37.280 | - Yeah, for handwritten recognition.
01:14:40.240 | - No, it's not for handwritten, it's for images.
01:14:45.160 | But I would like apply to handwritten.
01:14:47.720 | - Right, in theory it's more general.
01:14:49.760 | Okay, okay.
01:14:50.920 | So a lot of the things we've been talking about falls,
01:14:59.800 | we've been talking about philosophy a little bit,
01:15:01.840 | but also about mathematics and statistics.
01:15:05.520 | A lot of it falls into this idea,
01:15:08.080 | a universal idea of statistical theory of learning.
01:15:10.740 | What is the most beautiful and sort of powerful
01:15:16.800 | or essential idea you've come across,
01:15:19.120 | even just for yourself personally,
01:15:20.800 | in the world of statistics or statistic theory of learning?
01:15:25.480 | - Probably uniform convergence, which we did
01:15:29.520 | with Alexei Cherevonenkis.
01:15:33.000 | - Can you describe universal convergence?
01:15:36.040 | - You have law of large numbers.
01:15:38.980 | So for any function, expectation of function,
01:15:44.480 | average of function, converged expectation.
01:15:48.120 | But if you have set of functions,
01:15:50.520 | for any function it is true.
01:15:52.340 | But it should converge simultaneously
01:15:55.560 | for all set of functions.
01:15:57.500 | And for learning, you need,
01:16:04.960 | uniform convergence, just convergence is not enough.
01:16:08.540 | Because when you pick up one which gives minimum,
01:16:15.680 | you can pick up one function which does not converge
01:16:21.660 | and it will give you the best answer for this function.
01:16:28.020 | So you need uniform convergence to guarantee learning.
01:16:34.920 | So learning does not rely on trivial law of large numbers,
01:16:39.920 | it rely on universal.
01:16:42.060 | But idea of the convergence exists
01:16:47.960 | in statistics for a long time.
01:16:50.680 | But it is interesting that,
01:16:56.860 | as I think about myself, how stupid I was 50 years,
01:17:04.920 | I did not see weak convergence.
01:17:07.320 | I work only on strong convergence.
01:17:10.960 | But now I think that most powerful is weak convergence.
01:17:15.280 | Because it makes admissible set of functions.
01:17:18.880 | And even in all proverbs,
01:17:22.720 | when people try to understand recognition
01:17:26.440 | about dog law, looks like a dog and so on,
01:17:30.280 | they use weak convergence.
01:17:32.400 | People in language, they understand this.
01:17:34.600 | But when we're trying to create artificial intelligence,
01:17:40.840 | we want to invent in different way.
01:17:45.080 | We just consider strong convergence.
01:17:48.780 | - So reducing the set of admissible functions,
01:17:52.720 | you think there should be effort put into
01:17:57.720 | understanding the properties of weak convergence?
01:18:01.280 | - You know, in classical mathematics,
01:18:04.760 | in Gilbert space, there are only two ways,
01:18:08.800 | two forms of convergence, strong and weak.
01:18:12.120 | Now we can use both.
01:18:15.760 | That means that we did everything.
01:18:19.600 | And it so happened, when we use Hilbert space,
01:18:27.800 | which is very rich space, space of continuous functions,
01:18:32.000 | which has an integral and square.
01:18:36.880 | So we can apply weak and strong convergence for learning
01:18:42.400 | and have closed form solution.
01:18:44.200 | So for computationally simple.
01:18:47.680 | For me, it is sign that it is right way.
01:18:51.080 | Because you don't need any heuristic here,
01:18:55.760 | yes, whatever you want.
01:18:57.720 | But now, the only what left,
01:19:02.520 | it is concept of what is predicate.
01:19:04.720 | - Of predicate.
01:19:05.560 | - But it is not statistics.
01:19:08.000 | - By the way, I like the fact that you think
01:19:09.760 | that heuristics are a mess that should be removed
01:19:13.280 | from the system.
01:19:14.920 | So closed form solution is the ultimate--
01:19:18.480 | - No, it so happened, that when you're using
01:19:20.840 | right instrument, you have closed form solution.
01:19:26.280 | - Do you think intelligence, human level intelligence,
01:19:31.280 | when we create it, will have something
01:19:35.760 | like a closed form solution?
01:19:41.400 | - You know, now I'm looking on bones,
01:19:46.400 | which I gave bones for convergence.
01:19:49.560 | And when I looking for bones,
01:19:53.880 | I thinking what is the most appropriate kernel
01:19:58.880 | for this bone would be.
01:20:01.000 | So we know that in, say, all our businesses,
01:20:07.480 | we use radial basis function.
01:20:09.720 | But looking on the bone, I think that I start to understand
01:20:16.120 | that maybe we need to make corrections
01:20:18.800 | to radial basis function to be closer
01:20:23.640 | to work better for this bones.
01:20:28.440 | So I'm again trying to understand what type of kernel
01:20:32.560 | have best approximation,
01:20:37.560 | not an approximation, best fit to this bones.
01:20:42.560 | - Sure, so there's a lot of interesting work
01:20:45.600 | that could be done in discovering better function
01:20:47.840 | than radial basis functions for--
01:20:50.120 | - Yeah, but--
01:20:50.960 | - For the bones you find.
01:20:52.800 | - It still comes from, you're looking to mass
01:20:57.800 | and trying to understand what--
01:21:00.240 | - From your own mind, looking at the--
01:21:02.240 | - Yeah, but--
01:21:03.080 | - I don't know--
01:21:03.920 | - Then I trying to understand what will be good for that.
01:21:08.920 | - Yeah, but to me there's still a beauty,
01:21:14.000 | again, maybe I'm a descendant of valentorian,
01:21:16.280 | to heuristics.
01:21:18.000 | To me, ultimately, intelligence will be
01:21:20.880 | a mess of heuristics.
01:21:22.340 | And that's the engineering answer, I guess.
01:21:26.320 | - Absolutely.
01:21:27.480 | When you're doing, say, self-driving cars,
01:21:31.080 | the great guy who will do that.
01:21:35.040 | It does not matter what theory behind that.
01:21:38.640 | Who has a better feeling have to apply it.
01:21:43.800 | But by the way, it is the same story about predicate.
01:21:50.420 | Because you cannot create rule for,
01:21:53.880 | situation is much more than you have rule for that.
01:21:56.680 | But maybe you can have more abstract rule
01:22:03.520 | than it will be less than zero.
01:22:07.740 | It is the same story about ideas
01:22:10.800 | and ideas applied to specific cases.
01:22:15.140 | - But still you should--
01:22:17.360 | - You cannot avoid this.
01:22:18.920 | - Yes, of course, but you should still reach
01:22:20.880 | for the ideas to understand the science.
01:22:22.920 | - Let me kind of ask,
01:22:25.280 | do you think neural networks or functions
01:22:29.360 | can be made to reason?
01:22:32.660 | Sort of what do you think,
01:22:35.520 | been talking about intelligence,
01:22:37.120 | but this idea of reasoning.
01:22:39.640 | There's an element of sequentially disassembling,
01:22:44.540 | interpreting the images.
01:22:48.420 | So when you think of handwritten recognition,
01:22:51.860 | we kind of think that there'll be a single,
01:22:55.240 | there's an input and output.
01:22:56.920 | There's not a recurrence.
01:22:58.640 | - Yeah.
01:23:01.080 | - What do you think about sort of the idea of recurrence,
01:23:04.440 | of going back to memory and thinking through this sort of
01:23:07.480 | sequentially mangling the different representations
01:23:12.480 | over and over until you arrive at a conclusion?
01:23:17.940 | Or is ultimately all that can be wrapped up into a function?
01:23:22.940 | - You're suggesting that let us use this type of algorithm.
01:23:28.460 | When I starting thinking,
01:23:31.060 | I first of all starting to understand what I want.
01:23:35.180 | Can I write down what I want?
01:23:39.560 | And then I trying to formalize.
01:23:45.020 | And when I do that, I think I have to solve this problem.
01:23:52.980 | till now I did not see a situation where--
01:24:02.380 | - You need recurrence.
01:24:03.700 | - Recurrence.
01:24:04.540 | - But do you observe human beings?
01:24:07.860 | - Yeah.
01:24:08.700 | - Do you try to, it's the imitation question, right?
01:24:12.420 | It seems that human beings reason
01:24:14.900 | this kind of sequentially sort of,
01:24:19.620 | does that inspire in you a thought that we need to add that
01:24:24.140 | into our intelligent systems?
01:24:29.000 | You're saying, okay, I mean, you've kind of answered saying
01:24:34.440 | until now I haven't seen a need for it.
01:24:37.040 | And so because of that,
01:24:38.500 | you don't see a reason to think about it.
01:24:41.900 | - No, most of things I don't understand.
01:24:44.980 | In reasoning, in human, it is for me too complicated.
01:24:50.860 | For me, the most difficult part is to ask questions,
01:24:57.740 | good questions, how it works,
01:25:03.900 | how people asking questions.
01:25:06.820 | I don't know this.
01:25:11.720 | - You said that machine learning's not only
01:25:13.640 | about technical things, speaking of questions,
01:25:16.500 | but it's also about philosophy.
01:25:18.220 | So what role does philosophy play in machine learning?
01:25:23.500 | We talked about Plato, but generally thinking
01:25:26.860 | in this philosophical way,
01:25:29.980 | does it have, how does philosophy and math
01:25:33.860 | fit together in your mind?
01:25:35.240 | - So studies and then their implementation.
01:25:39.500 | It's like predicate, like say admissible set of functions.
01:25:44.500 | It comes together, everything.
01:25:51.480 | Because the first iteration of theory
01:25:56.480 | was done 50 years ago, it all that, this is theory.
01:26:00.400 | So everything's there.
01:26:02.240 | If you have data, you can, and your set of function
01:26:08.080 | is not, has not big capacity.
01:26:13.080 | So low VC dimension, you can do that.
01:26:15.760 | You can make structural risk minimization, control capacity.
01:26:19.700 | But you was not able to make admissible
01:26:26.120 | set of function good.
01:26:27.980 | Now, when suddenly realize that we did not use
01:26:33.680 | another idea of convergence, which we can,
01:26:38.260 | everything comes together.
01:26:41.500 | - But those are mathematical notions.
01:26:43.340 | Philosophy plays a role of simply saying
01:26:48.020 | that we should be swimming in the space of ideas.
01:26:52.100 | - Let's talk what is philosophy.
01:26:54.320 | Philosophy means understanding of life.
01:26:56.860 | So understanding of life, say people like Plato,
01:27:03.500 | they understand on very high abstract level of life.
01:27:06.800 | So, and whatever I doing, it just implementation
01:27:12.660 | of my understanding of life.
01:27:15.660 | But every new step, it is very difficult.
01:27:21.360 | For example, to find this idea that we need
01:27:31.580 | big convergence was not simple for me.
01:27:36.580 | - So that required thinking about life a little bit.
01:27:43.260 | Hard to trace, but there was some thought process.
01:27:48.860 | - You know, I working, I thinking about the same problem
01:27:52.980 | for 50 years or more.
01:27:55.420 | And again and again and again.
01:28:00.020 | I trying to be honest and that is very important.
01:28:02.660 | Not to be very enthusiastic, but concentrate
01:28:06.340 | on whatever we was not able to achieve.
01:28:09.460 | - Patient.
01:28:10.300 | - Yeah.
01:28:11.140 | And understand why.
01:28:13.360 | And now I understand that because I believe in math,
01:28:18.900 | I believe that in Wigner's idea.
01:28:23.740 | But now when I see that there are only two way
01:28:28.740 | of convergence and we using both,
01:28:32.060 | that means that we must do as well as people doing.
01:28:37.940 | But now exactly in philosophy and what we know
01:28:44.340 | about predicate, how we understand life,
01:28:47.020 | can we describe as a predicate.
01:28:50.100 | I thought about that and that is more or less obvious.
01:28:57.820 | Level of symmetry.
01:28:59.020 | But next, I have a feeling it's something about structures.
01:29:05.740 | But I don't know how to formulate,
01:29:11.820 | how to measure measure of structure and all that stuff.
01:29:16.180 | And the guy who will solve this challenge problem,
01:29:21.180 | then when we will looking how he did it,
01:29:27.060 | probably just only symmetry is not enough.
01:29:30.340 | - But something like symmetry will be there.
01:29:34.180 | - Oh yeah, absolutely, symmetry will be there.
01:29:37.580 | Level of symmetry will be there.
01:29:39.260 | And level of symmetry, anti-symmetry,
01:29:43.020 | diagonal, vertical, I even don't know how you can use
01:29:48.020 | in different direction idea of symmetry,
01:29:50.660 | it's very general.
01:29:52.300 | But it will be there.
01:29:54.940 | I think that people are very sensitive to idea of symmetry.
01:29:58.580 | But there are several ideas like symmetry.
01:30:02.940 | As I would like to learn.
01:30:07.020 | But you cannot learn just thinking about that.
01:30:11.820 | You should do challenging problems and then analyze them,
01:30:15.500 | why it was able to solve them.
01:30:20.220 | And then you will see.
01:30:22.740 | Very simple things, it's not easy to find.
01:30:25.420 | Even with talking about this every time.
01:30:30.460 | I was surprised, I tried to understand.
01:30:36.340 | Is people describe in language strong convergence
01:30:41.340 | mechanism for learning?
01:30:43.260 | I did not see, I don't know.
01:30:46.660 | But weak convergence, this dark story,
01:30:50.100 | and story like that, when you will explain to kid,
01:30:54.700 | you will use weak convergence argument.
01:30:57.620 | It looks like it does like this.
01:30:59.420 | But when you try to formalize, you're just ignoring this.
01:31:05.820 | Why, why 50 years from start of machine learning?
01:31:10.140 | - And that's the role of philosophers.
01:31:11.580 | - I think that might be, I don't know.
01:31:18.300 | Maybe this is serious.
01:31:19.980 | We should blame for that because
01:31:23.660 | empirical risk minimization, and all this stuff.
01:31:27.740 | If you read now textbooks, they just about bound
01:31:32.500 | about empirical risk minimization.
01:31:34.380 | They don't look for another problem like admissible set.
01:31:39.380 | - But on the topic of life,
01:31:45.060 | perhaps we, you could talk in Russian for a little bit.
01:31:50.020 | What's your favorite memory from childhood?
01:31:53.260 | (speaking in foreign language)
01:31:57.700 | - Music.
01:31:58.540 | - How about, can you try to answer in Russian?
01:32:02.660 | (speaking in foreign language)
01:32:07.580 | (speaking in foreign language)
01:32:11.860 | (speaking in foreign language)
01:32:15.900 | (speaking in foreign language)
01:32:20.660 | (speaking in foreign language)
01:32:24.580 | (speaking in foreign language)
01:32:29.100 | (speaking in foreign language)
01:32:33.020 | (speaking in foreign language)
01:32:37.900 | (speaking in foreign language)
01:33:05.580 | (speaking in foreign language)
01:33:09.500 | Now that we're talking about Bach,
01:33:13.100 | let's switch back to English
01:33:15.700 | 'cause I like Beethoven and Chopin, so.
01:33:17.740 | - Chopin, it's another music story.
01:33:21.340 | - But Bach, if we talk about predicates,
01:33:23.980 | Bach probably has the most sort of
01:33:28.980 | well-defined predicates and the like.
01:33:31.500 | - You know, it is very interesting to read
01:33:35.260 | what critics writing about Bach,
01:33:38.740 | which words they're using.
01:33:40.460 | They're trying to describe predicates.
01:33:42.860 | And then Chopin, it is very different vocabulary,
01:33:50.820 | very different predicates.
01:33:55.140 | And I think that if you will make collection of that,
01:34:02.700 | so maybe from this you can describe predicates
01:34:05.860 | for digit recognition as well.
01:34:07.660 | - From Bach and Chopin.
01:34:10.420 | - No, no, no, not from Bach and Chopin.
01:34:12.500 | - From the critic interpretation of the music, yeah.
01:34:15.220 | - When they're trying to explain music,
01:34:18.620 | what they use, they describe high-level ideas
01:34:24.740 | of Plato's ideas, what behind this music.
01:34:28.860 | - That's brilliant.
01:34:29.700 | So art is not self-explanatory in some sense.
01:34:34.700 | So you have to try to convert it into ideas.
01:34:39.060 | - It is ill-posed problems.
01:34:40.980 | When you go from ideas to the representation,
01:34:45.980 | it is easy way.
01:34:47.580 | But when you're trying to go back,
01:34:49.580 | it is ill-posed problems.
01:34:51.420 | But nevertheless, I believe that when you're looking
01:34:55.860 | from that, even from art, you will be able to find
01:35:00.300 | predicates for digit recognition.
01:35:02.060 | - That's such a fascinating and powerful notion.
01:35:07.620 | Do you ponder your own mortality?
01:35:10.580 | Do you think about it, do you fear it,
01:35:13.620 | do you draw insight from it?
01:35:15.060 | - About mortality?
01:35:18.220 | No, yeah.
01:35:20.620 | - Are you afraid of death?
01:35:25.820 | - Not too much.
01:35:26.900 | Not too much.
01:35:29.660 | It is pity that I will not be able to do something
01:35:33.740 | which I think I have a feeling to do that.
01:35:38.740 | For example, I will be very happy to work with guys,
01:35:44.460 | theoretician from music, to write this collection
01:35:52.060 | of description, how they describe music,
01:35:55.060 | how they use the predicate.
01:35:56.940 | And from art as well, then take what is in common
01:36:01.940 | and try to understand predicate,
01:36:06.180 | which is absolute for everything.
01:36:08.700 | - And then use that for visual recognition,
01:36:10.500 | see if there is a connection.
01:36:12.620 | - Exactly.
01:36:13.580 | - Ada, there's still time, we got time.
01:36:15.580 | (laughing)
01:36:19.380 | We got time.
01:36:20.220 | - It takes years and years and years.
01:36:24.060 | - I think so.
01:36:25.060 | - It's a long way.
01:36:26.460 | - Well, see, you've got the patient mathematicians mind.
01:36:30.900 | I think it could be done very quickly and very beautifully.
01:36:34.060 | I think it's a really elegant idea.
01:36:35.820 | - Yeah, but also--
01:36:36.940 | - Some of many.
01:36:37.780 | - You know, the most time, it is not to make
01:36:41.900 | this collection, to understand what is common
01:36:46.260 | to think about that once again and again and again.
01:36:49.500 | - Again and again and again, but I think sometimes,
01:36:52.660 | especially just when you say this idea now,
01:36:55.700 | even just putting together the collection
01:36:58.780 | and looking at the different sets of data,
01:37:03.300 | language, trying to interpret music,
01:37:05.500 | criticize music, and images,
01:37:08.740 | I think there'll be sparks of ideas that'll come.
01:37:10.940 | Of course, again and again, you'll come up
01:37:12.660 | with better ideas, but even just that notion
01:37:15.820 | is a beautiful notion.
01:37:16.940 | - I even have some example.
01:37:21.580 | So I have friend who was specialist in Russian poetry.
01:37:26.580 | She is professor of Russian poetry.
01:37:35.260 | He did not write poems, but she know a lot of stuff.
01:37:40.260 | She make book, several books, and one of them
01:37:49.300 | is a collection of Russian poetry.
01:37:53.500 | She have images of Russian poetry.
01:37:57.100 | She collect all images of Russian poetry.
01:37:59.340 | And I ask her to do following.
01:38:03.420 | You have NIPS, digit recognition,
01:38:08.500 | and we get 100 digits, or maybe less than 100.
01:38:14.660 | I don't remember, maybe 50 digits.
01:38:18.860 | And try from poetical point of view,
01:38:21.660 | describe every image which she see,
01:38:25.260 | using only words of images of Russian poetry.
01:38:30.260 | And she did it.
01:38:32.220 | And then we tried to,
01:38:37.580 | I call it learning using privileged information.
01:38:43.620 | I call it privileged information.
01:38:45.900 | You have on two languages.
01:38:48.060 | One language is just image of digit,
01:38:53.060 | and another language poetic description of this image.
01:38:56.620 | And this is privileged information.
01:39:00.060 | And there is a algorithm when you're working
01:39:04.500 | using privileged information, you're doing better.
01:39:07.460 | Much better, so.
01:39:10.380 | - So there's something there.
01:39:11.580 | - Something there.
01:39:12.860 | And there is, and you see,
01:39:16.980 | she unfortunately died.
01:39:19.020 | The collection of digits in poetic descriptions
01:39:25.900 | of these digits.
01:39:27.260 | - So there's something there in that poetic description.
01:39:32.900 | - But I think that there is an abstract ideas
01:39:37.900 | on the plateau level of ideas.
01:39:40.700 | - Yeah, that they're there, that could be discovered.
01:39:43.140 | And music seems to be a good entry point.
01:39:45.060 | - As soon as we start this challenge problem.
01:39:50.060 | - The challenge problem.
01:39:51.180 | - It immediately connected to all this stuff.
01:39:55.420 | - Especially with your talk and this podcast,
01:39:58.060 | and I'll do whatever I can to advertise it.
01:40:00.100 | It's such a clean, beautiful Einstein-like formulation
01:40:03.260 | of the challenge before us.
01:40:05.220 | - Right.
01:40:06.060 | - Let me ask another absurd question.
01:40:09.500 | We talked about mortality.
01:40:12.780 | We talked about philosophy of life.
01:40:14.660 | What do you think is the meaning of life?
01:40:16.660 | What's the predicate for mysterious existence
01:40:22.540 | here on Earth?
01:40:23.980 | - I don't know.
01:40:30.580 | It's very interesting.
01:40:34.740 | We have in Russia, I don't know if you know,
01:40:39.980 | the guy Strugatsky.
01:40:43.100 | They are writing pictures, they're thinking about
01:40:47.860 | human, what's going on.
01:40:49.740 | And they have idea that there are,
01:40:56.660 | they're developing two type of people.
01:41:01.860 | Common people and very smart people.
01:41:05.100 | They just started.
01:41:06.100 | And these two branches of people
01:41:09.860 | will go in different direction very soon.
01:41:13.180 | So that's what they're thinking about.
01:41:15.940 | (laughing)
01:41:18.220 | - So the purpose of life is to create two paths.
01:41:23.220 | - Two paths.
01:41:24.660 | - Of human societies.
01:41:25.980 | - Yes.
01:41:27.020 | Simple people and more complicated people.
01:41:29.980 | - Which do you like best?
01:41:31.540 | The simple people or the complicated ones?
01:41:34.500 | - I don't know.
01:41:35.340 | That is just his fantasy.
01:41:38.260 | But you know, every week we have guy
01:41:41.700 | who is just writer and also
01:41:46.700 | so it's called literature.
01:41:50.820 | And he explain how he understand literature
01:41:56.580 | and human relationship, how he see life.
01:42:00.300 | And I understood that I'm just small kids
01:42:05.980 | comparing to him.
01:42:09.500 | He's very smart guy in understanding life.
01:42:12.620 | He knows this predicate, he knows big blocks of life.
01:42:18.860 | I amused every time when I listen to him.
01:42:23.300 | And he just talking about literature.
01:42:27.380 | And I think that I was surprised.
01:42:31.380 | So the managers in big companies,
01:42:39.180 | most of them are guys who study English language
01:42:44.180 | in English literature.
01:42:50.020 | So why?
01:42:52.500 | Because they understand life.
01:42:54.820 | They understand models.
01:42:57.020 | And among them, maybe many talented critics
01:43:01.700 | which just analyzing this.
01:43:06.660 | And this is big science like probe did.
01:43:10.500 | This is blocks.
01:43:12.380 | Yeah, very smart.
01:43:15.340 | - It amazes me that you are and continue to be humbled
01:43:21.500 | by the brilliance of others.
01:43:22.940 | - I'm very modest about myself.
01:43:25.540 | I see so smart guys around.
01:43:28.940 | - Well, let me be immodest for you.
01:43:31.740 | You're one of the greatest mathematicians,
01:43:33.900 | statisticians of our time.
01:43:35.820 | It's truly an honor.
01:43:36.980 | Thank you for talking again.
01:43:38.580 | And let's talk.
01:43:39.500 | - It is not.
01:43:42.140 | - Yeah, let's talk.
01:43:43.460 | - I know my limits.
01:43:44.860 | - Let's talk again when your challenge is taken on
01:43:49.140 | and solved by grad student.
01:43:51.860 | Especially-- - Let's talk again.
01:43:53.900 | - When they use it. - I hope this happens.
01:43:56.140 | - Maybe music will be involved.
01:43:58.900 | Vladimir, thank you so much.
01:43:59.900 | It's been an honor. - Thank you very much.
01:44:02.620 | - Thanks for listening to this conversation
01:44:04.220 | with Vladimir Vapnik.
01:44:05.540 | And thank you to our presenting sponsor, Cash App.
01:44:08.780 | Download it, use code LEXPODCAST.
01:44:11.420 | You'll get $10 and $10 will go to FIRST,
01:44:14.340 | an organization that inspires and educates young minds
01:44:17.060 | to become science and technology innovators of tomorrow.
01:44:20.740 | If you enjoy this podcast, subscribe on YouTube,
01:44:23.500 | give us five stars on Apple Podcast,
01:44:25.340 | support on Patreon, or simply connect with me on Twitter
01:44:28.820 | at Lex Friedman.
01:44:30.300 | And now let me leave you with some words
01:44:33.500 | from Vladimir Vapnik.
01:44:35.580 | When solving a problem of interest,
01:44:37.740 | do not solve a more general problem as an intermediate step.
01:44:41.660 | Thank you for listening.
01:44:44.340 | I hope to see you next time.
01:44:46.220 | (upbeat music)
01:44:48.820 | (upbeat music)
01:44:51.420 | [BLANK_AUDIO]