Vladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence

00:00:00.000 | The following is a conversation with Vladimir Vapnik,

00:00:03.220 | part two, the second time we spoke on the podcast.

00:00:07.200 | He's the co-inventor of support vector machines,

00:00:09.780 | support vector clustering, VC theory,

00:00:12.120 | and many foundational ideas in statistical learning.

00:00:14.960 | He was born in the Soviet Union,

00:00:17.300 | worked at the Institute of Control Sciences in Moscow,

00:00:20.260 | then in the US, worked at AT&T, NEC Labs,

00:00:24.700 | Facebook AI Research,

00:00:26.120 | and now is a professor at Columbia University.

00:00:29.400 | His work has been cited over 200,000 times.

00:00:33.040 | The first time we spoke on the podcast

00:00:34.880 | was just over a year ago, one of the early episodes.

00:00:39.000 | This time, we spoke after a lecture he gave

00:00:41.460 | titled "Complete Statistical Theory of Learning"

00:00:44.440 | as part of the MIT series of lectures on deep learning

00:00:47.360 | and AI that I organized.

00:00:50.200 | I'll release the video of the lecture in the next few days.

00:00:53.720 | This podcast and the lecture are independent from each other

00:00:56.840 | so you don't need one to understand the other.

00:00:59.420 | The lecture is quite technical and math heavy,

00:01:03.100 | so if you do watch both,

00:01:04.360 | I recommend listening to this podcast first

00:01:06.800 | since the podcast is probably a bit more accessible.

00:01:10.180 | This is the Artificial Intelligence Podcast.

00:01:14.000 | If you enjoy it, subscribe on YouTube,

00:01:16.080 | give it five stars on Apple Podcast,

00:01:17.920 | support it on Patreon,

00:01:19.160 | or simply connect with me on Twitter

00:01:21.080 | at Lex Friedman, spelled F-R-I-D-M-A-N.

00:01:24.760 | As usual, I'll do one or two minutes of ads now

00:01:27.880 | and never any ads in the middle

00:01:29.320 | that can break the flow of the conversation.

00:01:31.460 | I hope that works for you

00:01:32.880 | and doesn't hurt the listening experience.

00:01:35.000 | This show is presented by Cash App,

00:01:38.200 | the number one finance app in the App Store.

00:01:40.600 | When you get it, use code LEXPODCAST.

00:01:43.800 | Cash App lets you send money to friends,

00:01:45.840 | buy Bitcoin, and invest in the stock market

00:01:48.120 | with as little as $1.

00:01:49.880 | Brokerage services are provided by Cash App Investing,

00:01:52.640 | a subsidiary of Square and member SIPC.

00:01:56.480 | Since Cash App allows you to send

00:01:58.000 | and receive money digitally peer-to-peer,

00:02:00.520 | and security in all digital transactions is very important,

00:02:03.600 | let me mention the PCI Data Security Standard,

00:02:06.760 | PCI DSS Level 1, that Cash App is compliant with.

00:02:11.760 | I'm a big fan of standards for safety and security,

00:02:15.280 | and PCI DSS is a good example of that,

00:02:18.940 | where a bunch of competitors got together

00:02:21.020 | and agreed that there needs to be a global standard

00:02:23.480 | around the security of transactions.

00:02:25.800 | Now, we just need to do the same for autonomous vehicles

00:02:28.880 | and AI systems in general.

00:02:30.480 | So again, if you get Cash App from the App Store

00:02:33.880 | or Google Play, and use the code LEXPODCAST,

00:02:37.280 | you get $10, and Cash App will also donate $10 to FIRST,

00:02:41.240 | one of my favorite organizations

00:02:43.120 | that is helping to advance robotics and STEM education

00:02:46.420 | for young people around the world.

00:02:49.720 | And now, here's my conversation with Vladimir Vapnik.

00:02:54.140 | You and I talked about Alan Turing yesterday a little bit.

00:02:58.540 | - Yes.

00:02:59.780 | - And that he, as the father of artificial intelligence,

00:03:02.660 | may have instilled in our field an ethic of engineering,

00:03:05.700 | and not science, seeking more to build intelligence

00:03:09.540 | rather than to understand it.

00:03:11.940 | What do you think is the difference

00:03:13.300 | between these two paths of engineering intelligence

00:03:17.700 | and the science of intelligence?

00:03:21.220 | - It's a completely different story.

00:03:23.660 | Engineering is imitation of human activity.

00:03:27.620 | You have to make a device which behave as human behave,

00:03:33.500 | have all the functions of human.

00:03:39.180 | It does not matter how you do it.

00:03:40.880 | But to understand what is intelligence about

00:03:46.020 | is quite different problem.

00:03:47.700 | So I think, I believe, that it's somehow related

00:03:53.880 | to predicate we talked yesterday about.

00:03:56.380 | Because, look at the Vladimir Probs idea.

00:04:02.860 | He just found 31 here predicates.

00:04:12.660 | He call it units, which can explain human behavior,

00:04:17.660 | at least in Russian tales.

00:04:20.780 | He look at Russian tales and derive from that.

00:04:24.940 | And then people realize that it more wide

00:04:27.460 | than in Russian tales.

00:04:29.580 | It is in TV, in movie serials, and so on and so on.

00:04:33.940 | - So you're talking about Vladimir Probs.

00:04:37.820 | - Right.

00:04:38.660 | - Who in 1928 published a book,

00:04:40.020 | Morphology of the Folk Tale, describing 31 predicates

00:04:44.260 | that have this kind of sequential structure

00:04:48.700 | that a lot of the stories, narratives follow

00:04:52.860 | in Russian folklore and in other content.

00:04:55.040 | We'll talk about it.

00:04:56.100 | I'd like to talk about predicates in a focused way.

00:04:59.180 | But let me, if you'll allow me to stay zoomed out

00:05:02.180 | on our friend Alan Turing.

00:05:03.740 | And he inspired a generation with the imitation game.

00:05:10.180 | - Yes.

00:05:11.660 | - Do you think, if we can linger on that a little bit longer,

00:05:15.220 | do you think we can learn, do you think learning

00:05:19.620 | to imitate intelligence can get us closer

00:05:22.380 | to understanding intelligence?

00:05:25.460 | So why do you think imitation is so far from understanding?

00:05:30.460 | - I think that it is different.

00:05:34.620 | Between you have different goals.

00:05:37.540 | So your goal is to create something, something useful.

00:05:42.540 | And that is great.

00:05:45.980 | And you can see how much things was done

00:05:49.740 | and I believe that it will be done even more.

00:05:52.340 | You have self-driving cars and also this business.

00:05:56.900 | It is great.

00:05:58.420 | And it was inspired by Turing vision.

00:06:02.080 | But understanding is very difficult.

00:06:05.500 | It's more or less philosophical category.

00:06:08.360 | What means understand the world?

00:06:11.300 | I believe in scheme which starts from Plato

00:06:15.820 | that there exists world of ideas.

00:06:19.220 | I believe that intelligence, it is world of ideas.

00:06:22.940 | But it is world of pure ideas.

00:06:25.900 | And when you combine them with reality things,

00:06:33.220 | it creates, as in my case, invariance,

00:06:36.540 | which is very specific.

00:06:38.580 | And that I believe the combination of ideas

00:06:43.580 | in way to constructing invariant is intelligence.

00:06:49.820 | But first of all, predicate.

00:06:53.340 | If you know predicate, and hopefully

00:06:56.180 | then not too much predicate exist.

00:07:00.820 | For example, 31 predicates for human behavior,

00:07:04.260 | it is not a lot.

00:07:06.060 | - Vladimir Propp used 31,

00:07:08.780 | you can even call them predicates,

00:07:12.340 | 31 predicates to describe stories, narratives.

00:07:17.340 | - Right.

00:07:18.380 | - So you think human behavior,

00:07:19.380 | how much of human behavior, how much of our world,

00:07:23.100 | our universe, all the things that matter in our existence

00:07:27.540 | can be summarized in predicates of the kind

00:07:30.860 | that Propp was working with?

00:07:32.660 | - I think that we have a lot of formal behavior.

00:07:36.980 | But I think that predicate is much less.

00:07:41.020 | Because even in this example, which I gave you yesterday,

00:07:45.260 | you saw that predicate can be,

00:07:50.260 | one predicate can construct many different invariants,

00:07:56.700 | depending on your data.

00:07:59.380 | They're applying to different data,

00:08:01.620 | and they give different invariants.

00:08:03.740 | But pure ideas, maybe not so much.

00:08:08.660 | - Not so many.

00:08:09.940 | - I don't know about that.

00:08:11.420 | But my guess, I hope, that's why challenge

00:08:15.060 | about digit recognition, how much you need.

00:08:18.780 | - I think we'll talk about computer vision

00:08:21.900 | and 2D images a little bit in your challenge.

00:08:24.820 | - That's exactly about intelligence.

00:08:26.780 | - That's exactly, that's exactly about,

00:08:30.820 | no, that hopes to be exactly about the spirit

00:08:34.460 | of intelligence in the simplest possible way.

00:08:37.540 | - Yeah, absolutely, you should start the simplest way.

00:08:40.380 | Otherwise you will not be able to do it.

00:08:42.700 | - Well, there's an open question whether starting

00:08:45.540 | at the MNIST digit recognition is a step

00:08:49.220 | towards intelligence, or it's an entirely different thing.

00:08:52.700 | I think that to beat records using, say,

00:08:56.820 | 100, 200 times less examples, you need intelligence.

00:09:00.700 | - You need intelligence.

00:09:01.540 | So let's, because you use this term,

00:09:03.780 | and it would be nice, I'd like to ask simple,

00:09:07.260 | maybe even dumb questions.

00:09:09.980 | Let's start with a predicate.

00:09:11.500 | In terms of terms and how you think about it,

00:09:14.940 | what is a predicate?

00:09:16.100 | - I don't know.

00:09:18.860 | I have a feeling formulas, they exist.

00:09:22.860 | But I believe that predicate for 2D images,

00:09:27.860 | one of them is symmetry.

00:09:32.300 | - Hold on a second, sorry.

00:09:33.520 | Sorry to interrupt and pull you back.

00:09:36.460 | At the simplest level, we're not being profound currently.

00:09:40.700 | A predicate is a statement of something that is true.

00:09:44.020 | - Yes.

00:09:45.760 | - Do you think of predicates as somehow

00:09:50.540 | probabilistic in nature, or is this binary,

00:09:54.700 | this is truly constraints of logical statements

00:09:59.020 | about the world?

00:10:00.220 | - In my definition, the simplest predicate is function.

00:10:04.180 | Function, and you can use this function

00:10:07.580 | to make inner product, that is predicate.

00:10:10.900 | - What's the input, and what's the output of the function?

00:10:14.020 | - Input is x, something which is input in reality.

00:10:18.660 | Say if you consider digit recognition,

00:10:22.460 | it's pixel space, input.

00:10:25.020 | But it is function which in pixel space.

00:10:29.800 | But it can be any function from pixel space.

00:10:34.660 | And you choose, and I believe that there are

00:10:39.500 | several functions which is important

00:10:42.940 | for understanding of images.

00:10:46.440 | One of them is symmetry.

00:10:48.280 | It's not so simple construction,

00:10:50.980 | as I described with linearity, with all this stuff.

00:10:55.020 | But another, I believe, I don't know how many,

00:10:58.860 | is how well-structurized is picture.

00:11:03.260 | - Structurized?

00:11:04.340 | - Yeah.

00:11:05.180 | - What do you mean by structurized?

00:11:06.980 | - It is formal definition, say something heavy

00:11:11.660 | on the left corner, not so heavy in the middle, and so on.

00:11:17.060 | You describe in general concept of what you assume.

00:11:21.860 | - Concepts, some kind of universal concepts.

00:11:25.240 | - Yeah.

00:11:26.700 | But I don't know how to formalize this.

00:11:29.200 | - Do you, so this is the thing.

00:11:31.600 | There's a million ways we can talk about this.

00:11:33.640 | I'll keep bringing it up.

00:11:34.680 | But we humans have such concepts,

00:11:38.180 | when we look at digits.

00:11:41.560 | But it's hard to put them, just like you're saying now,

00:11:43.900 | it's hard to put them into words.

00:11:46.000 | You know, that is example.

00:11:47.760 | When critics in music, trying to describe music,

00:11:53.640 | they use predicate.

00:11:57.100 | And not too many predicate, but in different combination.

00:12:02.880 | But they have some special words for describing music.

00:12:07.880 | And the same should be for images.

00:12:12.720 | But maybe there are critics who understand

00:12:15.680 | essence of what this image is about.

00:12:19.560 | - Do you think there exists critics

00:12:23.640 | who can summarize the essence of images, human beings?

00:12:28.640 | - I hope so, yes.

00:12:31.900 | But--

00:12:32.740 | - Explicitly state them on paper.

00:12:34.960 | The fundamental question I'm asking is,

00:12:42.600 | do you think there exists a small set of predicates

00:12:46.240 | that will summarize images?

00:12:48.080 | It feels to our mind like it does,

00:12:51.240 | that the concept of what makes a two and a three and a four.

00:12:56.000 | - No, no, no, it's not on this level.

00:12:58.920 | What, it should not describe two, three, four.

00:13:04.960 | It describes some construction

00:13:07.680 | which allow you to create invariance.

00:13:11.960 | - An invariance, sorry to stick on this, but terminology.

00:13:16.160 | - Invariance, it is property of your image.

00:13:21.160 | I can say, looking at my image, it is more or less symmetric.

00:13:31.640 | And I can give you a value of symmetry.

00:13:34.160 | Say, level of symmetry, using this function

00:13:39.400 | which I gave yesterday.

00:13:42.240 | And you can describe that your image

00:13:47.240 | has these characteristics,

00:13:51.600 | exactly in the way how musical critics describe music.

00:13:56.600 | So, but this is invariant applied to specific data,

00:14:02.680 | to specific music, to something.

00:14:07.720 | I strongly believe in this plot ideas

00:14:12.440 | that there exists world of predicate

00:14:15.000 | and world of reality and predicate and reality

00:14:18.920 | is somehow connected and you have to know that.

00:14:22.480 | - Let's talk about Plato a little bit.

00:14:24.020 | So you draw a line from Plato to Hegel to Wigner to today.

00:14:29.020 | - Yes.

00:14:30.280 | - So Plato has forms, the theory of forms.

00:14:35.560 | There's a world of ideas and a world of things

00:14:38.600 | as you talk about and there's a connection.

00:14:40.440 | And presumably the world of ideas is very small

00:14:44.720 | and the world of things is arbitrarily big.

00:14:48.060 | But they're all, what Plato calls them,

00:14:49.840 | like it's a shadow, the real world is a shadow

00:14:54.040 | from the world of forms.

00:14:54.880 | - Yeah, you have projection.

00:14:56.840 | - Projection.

00:14:57.680 | - Of world of idea.

00:14:59.240 | - Yeah, very poetic.

00:15:00.720 | - In reality you can realize this projection

00:15:04.840 | using these invariants because it is projection

00:15:09.320 | for on specific examples which creates specific features

00:15:13.680 | of specific objects.

00:15:15.120 | - So the essence of intelligence is while only being able

00:15:22.400 | to observe the world of things,

00:15:24.720 | try to come up with a world of ideas.

00:15:27.040 | - Exactly, like in this music story.

00:15:30.000 | Intelligent musical critics knows all this world

00:15:33.160 | and have a feeling about what--

00:15:34.800 | - I feel like that's a contradiction,

00:15:36.360 | intelligent music critics.

00:15:39.160 | I think music is to be

00:15:42.080 | enjoyed in all its forms.

00:15:47.720 | The notion of critic like a food critic.

00:15:50.040 | - No, I don't want that emotion.

00:15:52.360 | - That's an interesting question.

00:15:53.760 | Does emotion, there's certain elements

00:15:56.720 | of the human psychology, of the human experience

00:16:00.200 | which seem to almost contradict intelligence and reason.

00:16:05.200 | Like emotion, like fear, like love, all of those things.

00:16:10.520 | Are those not connected in any way to the space of ideas?

00:16:16.440 | - That I don't know.

00:16:18.760 | I just want to be concentrate on very simple story,

00:16:25.280 | on digit recognition.

00:16:27.960 | - So you don't think you have to love and fear death

00:16:30.480 | in order to recognize digits?

00:16:32.840 | - I don't know.

00:16:34.560 | Because it's so complicated.

00:16:37.000 | It involves a lot of stuff which I never consider.

00:16:41.600 | But I know about digit recognition.

00:16:44.280 | And I know that for digit recognition

00:16:49.320 | to get the records from small number of observations,

00:16:57.360 | you need predicate.

00:16:59.400 | But not special predicate for this problem.

00:17:02.760 | But universal predicate which understand world of images.

00:17:08.600 | - Of visual information.

00:17:09.760 | - Visual, yeah.

00:17:11.400 | But on the first step, they understand, say,

00:17:15.880 | world of handwritten digits or characters

00:17:19.800 | or something simple.

00:17:21.640 | - So like you said, symmetry is an interesting one.

00:17:23.960 | - That's what I think one of the predicates

00:17:27.560 | related to symmetry.

00:17:29.400 | The level of symmetry.

00:17:30.960 | - Okay, degree of symmetry.

00:17:32.160 | So you think symmetry at the bottom is a universal notion

00:17:37.160 | and there's degrees of a single kind of symmetry

00:17:41.520 | or is there many kinds of symmetries?

00:17:44.200 | - Many kinds of symmetries.

00:17:46.040 | There is a symmetry, anti-symmetry, say, letter S.

00:17:52.400 | So it has vertical anti-symmetry.

00:17:56.340 | And it could be diagonal symmetry, vertical symmetry.

00:18:02.680 | - So when you cut vertically the letter S.

00:18:07.680 | - Yeah, then the upper part and lower part

00:18:12.800 | in different directions.

00:18:15.720 | - Yeah, inverted along the Y axis.

00:18:18.960 | But that's just like one example of symmetry, right?

00:18:21.280 | Isn't there like--

00:18:22.120 | - Right, but there is a degree of symmetry.

00:18:26.360 | If you play all this derivative stuff

00:18:29.160 | to do tangent distance,

00:18:34.160 | whatever I describe, you can have a degree of symmetry.

00:18:40.040 | And that is what describing reason of image.

00:18:45.480 | It is the same as you will describe this image.

00:18:51.920 | Same about digits, it has anti-symmetry.

00:18:56.920 | Digits three is symmetric, more or less look for symmetry.

00:19:02.920 | - Do you think such concepts like symmetry,

00:19:07.840 | predicates like symmetry,

00:19:09.840 | is it a hierarchical set of concepts?

00:19:14.320 | Or are these independent, distinct predicates

00:19:20.080 | that we want to discover, some set of?

00:19:23.640 | - No, there is a deal of symmetry.

00:19:26.000 | And you can, this idea of symmetry,

00:19:29.200 | make very general, like degree of symmetry.

00:19:35.240 | A degree of symmetry can be zero, no symmetry at all.

00:19:40.720 | Or degree of symmetry, say, more or less symmetrical.

00:19:47.000 | But you have one of these descriptions.

00:19:50.520 | And symmetry can be different.

00:19:52.520 | As I told, horizontal, vertical, diagonal,

00:19:56.360 | and anti-symmetry is also a concept of symmetry.

00:20:01.360 | - What about shape in general?

00:20:03.320 | I mean, symmetry is a fascinating notion, but--

00:20:05.840 | - No, no, I'm talking about digits.

00:20:08.640 | I would like to concentrate on all,

00:20:11.400 | I would like to know predicates for digit recognition.

00:20:14.520 | - Yes, but symmetry is not enough

00:20:17.000 | for digit recognition, right?

00:20:19.400 | - It is not necessarily for digit recognition.

00:20:22.560 | It helps to create invariant,

00:20:26.800 | which you can use when you will have examples

00:20:31.800 | for digit recognition.

00:20:35.040 | You have regular problem of digit recognition,

00:20:38.320 | you have examples of the first class, or second class.

00:20:41.640 | Plus, you know that there exists concept of symmetry.

00:20:45.880 | And you apply when you're looking for decision rule,

00:20:50.440 | you will apply concept of symmetry

00:20:55.440 | of this level of symmetry, which you estimate from.

00:21:00.160 | So let's talk, everything comes from weak convergence.

00:21:05.160 | - What is convergence, what is weak convergence,

00:21:09.280 | what is strong convergence?

00:21:11.440 | I'm sorry, I'm gonna do this to you.

00:21:13.400 | What are we converging from and to?

00:21:15.320 | - You're converging, you would like to have a function.

00:21:20.520 | The function which, say indicator function,

00:21:23.640 | which indicate your digit five, for example.

00:21:28.640 | - A classification task?

00:21:31.520 | - Let's talk only about classification.

00:21:33.800 | - So classification means you will say

00:21:36.880 | whether this is a five or not,

00:21:38.640 | or say which of the 10 digits it is.

00:21:40.680 | - Right, right.

00:21:42.160 | I would like to have these functions.

00:21:45.600 | Then, I have some examples.

00:21:51.600 | I can consider property of these examples.

00:22:00.240 | Say symmetry, and I can measure level of symmetry

00:22:04.880 | for every digit.

00:22:08.060 | And then I can take average from my training data,

00:22:13.060 | and I will consider only functions

00:22:19.660 | of conditional probability,

00:22:24.020 | which I'm looking for my decision rule.

00:22:26.340 | Which applying to digits,

00:22:36.460 | will give me the same average as I observe on training data.

00:22:40.780 | So actually, this is different level

00:22:45.380 | of description of what you want.

00:22:48.500 | You want not just, you show not one digit.

00:22:53.500 | You show this predicate, show general property

00:22:58.860 | of all digits which you have in mind.

00:23:03.740 | If you have in mind digit three,

00:23:06.080 | it gives you property of digit three,

00:23:10.380 | and you select as admissible set of function,

00:23:13.580 | only function, which keeps this property.

00:23:16.980 | You will not consider other functions.

00:23:20.760 | So you're immediately looking for smaller subset of function.

00:23:24.940 | - That's what you mean by admissible functions.

00:23:27.140 | - Admissible function, exactly.

00:23:28.420 | - Which is still a pretty large,

00:23:30.940 | for the number three, it's a large--

00:23:32.780 | - It's pretty large, but if you have one predicate.

00:23:36.600 | But according to, there is a strong and weak convergence.

00:23:41.600 | Strong convergence is convergence in function.

00:23:45.280 | You're looking for the function, on one function,

00:23:49.240 | and you're looking for another function.

00:23:51.900 | And square difference from them should be small.

00:23:56.900 | If you take difference in any points,

00:24:01.880 | make a square, make an integral, and it should be small.

00:24:05.640 | That is convergence in function.

00:24:08.040 | Suppose you have some function, any function.

00:24:11.280 | So I would say, I say that some function

00:24:15.440 | converge to this function.

00:24:16.960 | If integral from square difference between them is small.

00:24:22.880 | - That's the definition of strong convergence.

00:24:24.800 | - That definition of strong convergence.

00:24:25.800 | - Two functions, the integral of the difference is small.

00:24:28.960 | - It is convergence in functions.

00:24:31.160 | - Yeah.

00:24:32.320 | - But you have different convergence in functionals.

00:24:36.760 | You take any function, you take some function, phi,

00:24:40.120 | and take inner product, this function is f function.

00:24:45.120 | F zero function, which you want to find.

00:24:50.360 | And that gives you some value.

00:24:52.000 | So you say that set of functions converge

00:25:00.080 | in inner product to this function,

00:25:03.080 | if this value of inner product converge to value f zero.

00:25:08.080 | That is for one phi.

00:25:12.520 | But weak convergence requires that it converge

00:25:15.680 | for any function of Hilbert space.

00:25:20.680 | If it converge for any function of Hilbert space,

00:25:24.280 | then you will say that this is weak convergence.

00:25:28.320 | You can think that when you take integral,

00:25:32.240 | that is property, integral property of function.

00:25:36.000 | For example, if you will take sine or cosine,

00:25:39.200 | it is coefficient of, say, Fourier expansion.

00:25:43.960 | So if it converge for all coefficients

00:25:50.560 | of Fourier expansion, so under some condition,

00:25:54.280 | it converge to function you're looking for.

00:25:58.160 | But weak convergence means any property.

00:26:01.240 | Convergence not point-wise,

00:26:05.880 | but integral property of function.

00:26:08.660 | So weak convergence means integral property of functions.

00:26:13.880 | When I'm talking about predicate,

00:26:16.120 | I would like to formulate which integral properties

00:26:21.120 | I would like to have for convergence.

00:26:27.920 | So, and if I will take one predicate,

00:26:32.920 | it's function which I measure property.

00:26:35.540 | If I will use one predicate and say,

00:26:40.640 | I will consider only function which give me

00:26:44.840 | the same value as this predicate,

00:26:47.960 | I selecting set of functions from functions

00:26:52.960 | which is admissible in the sense that function

00:26:57.680 | which I looking for in this set of functions.

00:27:01.080 | Because I checking in training data,

00:27:06.080 | it gives the same.

00:27:07.620 | - Yeah, so it always has to be connected

00:27:10.320 | to the training data in terms of--

00:27:12.640 | - Yeah, but property, you can know independent

00:27:17.400 | on training data.

00:27:18.800 | And this guy, prop.

00:27:21.280 | - Yeah.

00:27:22.120 | - So there is formal property.

00:27:24.040 | 31 property and--

00:27:25.480 | - Fairy tale, Russian fairy tale.

00:27:27.280 | But Russian fairy tale is not so interesting.

00:27:30.520 | More interesting is that people applied this

00:27:33.280 | to movies, to theater, to different things.

00:27:38.280 | The same works, they're universal.

00:27:42.000 | - Well, so I would argue that there's a little bit

00:27:44.800 | of a difference between the kinds of things

00:27:48.560 | that were applied to which are essentially stories

00:27:51.560 | and digit recognition.

00:27:53.420 | - It is the same story.

00:27:55.920 | You're saying digits, there's a story within the digit.

00:27:59.720 | - Yeah.

00:28:00.560 | But my point is why I hope that it possible

00:28:06.480 | to beat record using not 60,000,

00:28:11.480 | but say 100 times less.

00:28:13.860 | Because instead you will give predicates.

00:28:16.580 | And you will select your decision not from

00:28:22.120 | wide set of function, but from set of function

00:28:25.720 | which keeps its predicates.

00:28:28.080 | But predicates is not related just to digit recognition.

00:28:32.840 | - Right, so--

00:28:33.880 | - Like in Plotter's case.

00:28:35.520 | (laughing)

00:28:37.700 | - Do you think it's possible to automatically

00:28:40.240 | discover the predicates?

00:28:42.160 | So you basically said that the essence of intelligence

00:28:46.620 | is the discovery of good predicates.

00:28:49.680 | - Yeah.

00:28:50.500 | - Now the natural question is,

00:28:55.200 | you know, that's what Einstein was good at doing in physics.

00:28:58.200 | Can we make machines do these kinds of discovery

00:29:03.080 | of good predicates?

00:29:04.560 | Or is this ultimately a human endeavor?

00:29:06.840 | - That I don't know.

00:29:09.120 | I don't think that machine can do.

00:29:11.440 | Because according to theory about weak convergence,

00:29:16.440 | any function from Hilbert space can be predicated.

00:29:23.200 | So you have infinite number of predicate in upper,

00:29:27.680 | and before you don't know which predicate is good and which.

00:29:32.680 | But whatever probe show and why people call it breakthrough,

00:29:37.840 | that there is not too many predicate which cover

00:29:44.920 | most of situation happened in the world.

00:29:51.360 | - So there's a sea of predicates.

00:29:53.200 | And most of the, only a small amount are useful

00:29:57.800 | for the kinds of things that happen in the world.

00:30:00.240 | - I think that I would say only small part of predicate,

00:30:06.360 | very useful.

00:30:08.720 | Useful all of them.

00:30:11.360 | - Only very few are what we should,

00:30:13.720 | let's call them good predicates.

00:30:15.480 | - Very good predicates.

00:30:16.680 | - Very good predicates.

00:30:18.160 | So can we linger on it, what's your intuition,

00:30:21.800 | why is it hard for a machine to discover good predicates?

00:30:26.800 | - Even in my talk described how to do predicate.

00:30:30.760 | How to find new predicate.

00:30:32.720 | I'm not sure that it is very good.

00:30:35.000 | - What did you propose in your talk?

00:30:36.680 | - No, in my talk I gave example for diabetes.

00:30:41.680 | - Diabetes, yeah.

00:30:43.800 | - When we achieve some percent,

00:30:46.240 | so then we're looking for area

00:30:48.440 | where some sort of predicate, which I formulate,

00:30:53.760 | does not,

00:30:55.880 | keeps invariant.

00:31:01.440 | So if it doesn't keep, I retrain my data,

00:31:07.000 | I select only function which keeps it invariant.

00:31:11.160 | And when I did it, I improved my performance.

00:31:14.500 | I can look for this predicate.

00:31:16.520 | I know technically how to do that.

00:31:19.560 | And you can, of course, do it using machine.

00:31:24.560 | But I'm not sure that we will construct

00:31:28.880 | the smartest predicate.

00:31:31.000 | - But this is the, allow me to linger on it,

00:31:34.200 | because that's the essence, that's the challenge,

00:31:36.320 | that is artificial, that's the human level intelligence

00:31:40.360 | that we seek, is the discovery of these good predicates.

00:31:43.840 | You've talked about deep learning as a way to,

00:31:47.520 | the predicates they use and the functions are mediocre.

00:31:52.520 | We can find better ones.

00:31:55.080 | - Let's talk about deep learning.

00:31:57.360 | - Sure, let's do it.

00:31:58.200 | - I know only Janss-Likun, convolutional network.

00:32:03.200 | And what else?

00:32:05.280 | I don't know, and it's a very simple convolution.

00:32:07.960 | - There's not much else to know.

00:32:08.800 | - It's left and right.

00:32:10.480 | I can do it like that, with one predicate.

00:32:14.120 | It is--

00:32:14.960 | - Convolution is a single predicate.

00:32:16.640 | - It's single, it's single predicate.

00:32:21.200 | - Yes, but--

00:32:22.040 | - You know exactly, you take the derivative

00:32:25.480 | for translation and predicate, this should be kept.

00:32:29.940 | - So that's a single predicate,

00:32:32.480 | but humans discovered that one, or at least--

00:32:35.040 | - That is a risk, not too many predicates.

00:32:39.040 | And that is big story because Jan did it 25 years ago

00:32:43.760 | and nothing so clear was added to deep network.

00:32:48.760 | And then I don't understand

00:32:53.280 | why we should talk about deep network

00:32:57.440 | instead of talking about piecewise linear functions

00:33:01.280 | which keeps this predicate.

00:33:02.880 | - Well, a counter argument is

00:33:07.320 | that maybe the amount of predicates necessary

00:33:11.160 | to solve general intelligence, say in the space of images,

00:33:16.160 | doing efficient recognition of handwritten digits

00:33:20.600 | is very small.

00:33:22.360 | And so we shouldn't be so obsessed about finding,

00:33:26.000 | we'll find other good predicates

00:33:28.960 | like convolution, for example.

00:33:30.720 | You know, there has been other advancements

00:33:33.880 | like if you look at the work with attention,

00:33:37.400 | there's attentional mechanisms,

00:33:39.480 | and especially used in natural language,

00:33:42.160 | focusing the network's ability

00:33:44.200 | to learn at which part of the input to look at.

00:33:47.640 | The thing is, there's other things besides predicates

00:33:51.040 | that are important for the actual engineering mechanism

00:33:55.280 | of showing how much you can really do

00:33:58.080 | given such these predicates.

00:34:02.120 | - I mean, that's essentially the work of deep learning

00:34:04.360 | is constructing architectures

00:34:07.160 | that are able to be, given the training data,

00:34:11.400 | to be able to converge towards

00:34:14.720 | a function that can approximate, can generalize well.

00:34:21.340 | It's an engineering problem.

00:34:24.400 | - Yeah, I understand.

00:34:26.000 | But let's talk not on emotional level

00:34:29.880 | but on a mathematical level.

00:34:31.880 | You have set of piecewise linear functions.

00:34:36.440 | It is all possible neural networks.

00:34:40.140 | It's just piecewise linear functions.

00:34:44.040 | There's many, many pieces.

00:34:45.360 | - Large, large number of piecewise linear functions.

00:34:47.640 | - Exactly, but-- - Very large.

00:34:49.440 | - Very large. - Almost feels like

00:34:51.280 | too large. - But it's still simpler

00:34:53.280 | than say convolution,

00:34:56.160 | than reproducing kernel Hilbert space

00:34:58.840 | which have a Hilbert set of functions.

00:35:00.880 | - What's Hilbert space?

00:35:03.000 | - It's space with infinite number of coordinates,

00:35:07.160 | say, or function for expansion, something like that.

00:35:11.820 | So it's much richer.

00:35:13.460 | So when I'm talking about closed form solution,

00:35:17.520 | I'm talking about this set of function,

00:35:20.840 | not piecewise linear set which is particular case.

00:35:29.560 | It is small part--

00:35:30.920 | - So neural networks is a small part of the space

00:35:33.600 | you're talking about, of functions you're talking about.

00:35:35.960 | - Say, small set of functions.

00:35:39.080 | Let me take that.

00:35:40.640 | But it is fine, it is fine.

00:35:42.760 | I don't want to discuss the small or big,

00:35:46.600 | you take advantage.

00:35:47.960 | So you have some set of functions.

00:35:50.080 | So now when you're trying to create architecture,

00:35:54.360 | you would like to create admissible set of functions

00:35:59.120 | all your tricks to use not all functions,

00:36:03.360 | but some subset of this set of functions.

00:36:06.000 | Say, when you're introducing convolutional net,

00:36:10.140 | it is way to make this subset useful for you.

00:36:15.140 | But from my point of view, convolutional,

00:36:19.800 | it is something you want to keep some invariants,

00:36:24.800 | say translation invariants.

00:36:27.980 | But now if you understand this,

00:36:31.840 | and you cannot explain on the level of ideas

00:36:36.840 | what neural network does,

00:36:39.740 | you should agree that it is much better

00:36:44.400 | to have a set of functions.

00:36:46.720 | As I say, this set of functions should be admissible,

00:36:51.140 | it must keep this invariant, this invariant,

00:36:53.640 | and that invariant.

00:36:55.260 | You know that as soon as you incorporate new invariants,

00:36:59.080 | set of function becomes smaller and smaller and smaller.

00:37:02.160 | - But all the invariants are specified by you, the human.

00:37:05.540 | - Yeah, but what I hope that there is a standard predicate,

00:37:11.740 | like probe show,

00:37:14.200 | that what I want to find for digital recognition.

00:37:19.640 | If we start, it is completely new area,

00:37:22.960 | what is intelligence about on the level

00:37:25.840 | starting from Plata's idea,

00:37:28.640 | what is world of ideas.

00:37:30.900 | So, and I believe that it's not too many.

00:37:34.780 | But you know, it is amusing that mathematician

00:37:39.800 | doing something in neural network, in general function,

00:37:44.040 | but people from literature, from art,

00:37:47.600 | they use this all the time.

00:37:49.480 | - That's right.

00:37:50.320 | - New invariants saying, say,

00:37:53.800 | it is great how people describe music,

00:37:57.040 | we should learn from that.

00:37:58.800 | And something on this level,

00:38:02.080 | but so why Vladimir Probe,

00:38:04.960 | who was just theoretical,

00:38:08.200 | who studied theoretical literature, he found that.

00:38:12.280 | - You know what, let me throw that right back at you,

00:38:15.240 | because there's a little bit of a,

00:38:17.360 | that's less mathematical and more emotional,

00:38:20.080 | philosophical, Vladimir Probe.

00:38:22.760 | I mean, he wasn't doing math.

00:38:25.000 | - No.

00:38:25.840 | - And you just said another emotional statement,

00:38:30.120 | which is you believe that this Plato world of ideas

00:38:34.000 | is small.

00:38:34.880 | - I hope.

00:38:37.040 | - I hope.

00:38:37.880 | What's your intuition, though, if we can linger on it?

00:38:43.600 | - You know, it is not just small or big.

00:38:48.640 | I know exactly, then when I introducing

00:38:53.000 | some predicate, I decrease set of functions.

00:38:58.920 | But my goal to decrease set of function much.

00:39:02.940 | - By as much as possible.

00:39:05.040 | - By as much as possible.

00:39:06.520 | Good predicate, which does this.

00:39:11.120 | Then I should choose next predicate,

00:39:13.360 | which decreases as much as possible.

00:39:17.280 | So set of good predicate,

00:39:19.440 | it is such that they decrease

00:39:23.040 | this amount of admissible function.

00:39:27.840 | - So if each good predicate significantly reduces

00:39:31.060 | the set of admissible functions,

00:39:32.640 | that there naturally should not be that many good predicates.

00:39:35.600 | - No, but if you reduce very well the VC dimension

00:39:40.600 | of the function, of admissible set of function,

00:39:45.560 | it's small, and you need not too much

00:39:49.160 | training data to do well.

00:39:51.260 | - And VC dimension, by the way,

00:39:55.360 | is some measure of capacity of this set of functions.

00:39:57.760 | - Right.

00:39:58.600 | Roughly speaking, how many function in this set.

00:40:02.000 | So you're decreasing, decreasing,

00:40:04.000 | and it makes easy for you to find

00:40:07.720 | function you're looking for.

00:40:09.120 | But the most important part to create

00:40:13.400 | good admissible set of functions.

00:40:15.800 | And it probably, there are many ways,

00:40:18.880 | but the good predicate, it's such that can do that.

00:40:23.880 | So for this duck, you should know a little bit about duck,

00:40:30.560 | because--

00:40:31.600 | - What are the three fundamental laws of ducks?

00:40:35.360 | - Looks like a duck, swims like a duck,

00:40:37.440 | and quacks like a duck.

00:40:38.400 | - You should know something about ducks to be able to--

00:40:41.200 | - Not necessarily.

00:40:42.560 | Looks like, say, horse.

00:40:45.000 | It's also good.

00:40:45.840 | - It generalizes from ducks.

00:40:50.000 | - And talk like, and make sound like horse, or something.

00:40:54.400 | And run like horse, and moves like horse.

00:40:57.380 | It is general.

00:40:58.520 | It is general predicate that this applied to duck.

00:41:04.640 | But for duck, you can say, play chess like duck.

00:41:09.900 | - You cannot say, play chess like duck.

00:41:11.620 | - Why not?

00:41:12.680 | - So you're saying you can, but that would not be a good--

00:41:15.800 | - No, you will not reduce a lot of functions.

00:41:18.240 | - You would not do, yeah, you would not reduce

00:41:20.200 | the set of functions.

00:41:21.700 | - So you can, the story is, formal story,

00:41:25.140 | mathematical story, is that you can use any function

00:41:28.840 | you want as a predicate.

00:41:30.340 | But some of them are good, some of them are not,

00:41:33.200 | because some of them reduce a lot of functions

00:41:36.120 | to admissible set.

00:41:38.040 | Some of them--

00:41:39.760 | - But the question is, and I'll probably keep asking

00:41:42.120 | this question, but how do we find such,

00:41:45.680 | what's your intuition?

00:41:47.400 | Handwritten recognition, how do we find

00:41:51.080 | the answer to your challenge?

00:41:52.680 | - Yeah, I understand it like that.

00:41:55.980 | I understand what--

00:41:57.920 | - What defined?

00:41:59.240 | - What it means, a new predicate.

00:42:01.480 | Like, guy who understand music can say this word

00:42:06.240 | which he described when he listened to music.

00:42:09.640 | He understand music.

00:42:11.720 | He use not too many different, or you can do like prop.

00:42:15.600 | You can make collection what he talking about music,

00:42:19.320 | about this, about that.

00:42:21.040 | It's not too many different situation he described.

00:42:25.040 | - Because we mentioned Vladimir Prop a bunch,

00:42:26.960 | let me just mention, there's a sequence of 31

00:42:30.200 | structural notions that are common in stories,

00:42:36.920 | and I think--

00:42:37.760 | - He called units.

00:42:38.600 | - Units, and I think they resonate.

00:42:40.480 | I mean, it starts, just to give an example,

00:42:43.600 | absention, a member of the hero's community or family

00:42:46.520 | leaves the security of the home environment,

00:42:48.920 | then it goes to the interdiction,

00:42:51.040 | a forbidding edict or command is passed upon the hero,

00:42:54.520 | don't go there, don't do this.

00:42:56.620 | The hero's warned against some action.

00:42:58.680 | Then, step three, violation of interdiction.

00:43:03.680 | Break the rules, break out on your own.

00:43:07.580 | Then, reconnaissance, the villain makes an effort

00:43:10.400 | to attain knowledge, needing to fulfill their plot,

00:43:12.760 | so on, it goes on like this,

00:43:14.240 | ends in a wedding, number 31, happily ever after.

00:43:19.240 | - No, he just gave description of all situation.

00:43:25.640 | He understands this world.

00:43:28.160 | - Of folk tales.

00:43:29.280 | - Yeah, not folk, but stories.

00:43:33.160 | And this story's not in just folk tales.

00:43:36.560 | The story's in detective serials as well.

00:43:39.960 | - And probably in our lives, we probably live--

00:43:43.760 | - Read this.

00:43:45.080 | At the end, they wrote that this predicate is good

00:43:50.080 | for different situation, for movie, for theater.

00:43:56.440 | - By the way, there's also criticism, right?

00:44:00.640 | There's another way to interpret narratives

00:44:03.840 | from Claude Lévi-Strauss.

00:44:08.840 | - I don't know.

00:44:10.920 | I am not in this business.

00:44:12.640 | - No, I know, it's theoretical literature,

00:44:14.400 | but it's looking at paradise behind the scenes.

00:44:17.240 | - It's always the--

00:44:18.240 | - Philosophers argue. - Discussion, yeah.

00:44:20.160 | But at least there is units.

00:44:23.800 | It's not too many units that can describe,

00:44:27.200 | but this guy probably gives another units,

00:44:30.880 | or another way of--

00:44:31.760 | - Exactly, another set of units.

00:44:34.440 | - Another set of predicates.

00:44:35.960 | Doesn't matter how, but they exist, probably.

00:44:40.960 | - My question is whether given those units,

00:44:46.240 | whether without our human brains to interpret these units,

00:44:50.360 | they would still hold as much power as they have.

00:44:53.480 | Meaning, are those units enough

00:44:56.220 | when we give them to an alien species?

00:44:58.880 | - Let me ask you.

00:45:00.320 | Do you understand digit images?

00:45:05.320 | - No, I don't understand.

00:45:07.720 | - No, no, no.

00:45:08.640 | When you can recognize these digit images,

00:45:11.000 | it means that you understand.

00:45:12.480 | - Yes, I understand.

00:45:14.200 | - You understand characters, you understand--

00:45:17.280 | - No, no, no, no.

00:45:18.920 | It's the imitation versus understanding question,

00:45:25.480 | because I don't understand the mechanism

00:45:28.360 | by which I understand.

00:45:29.200 | - No, no, I'm not talking about,

00:45:30.480 | I'm talking about predicates.

00:45:32.800 | You understand that it involves symmetry,

00:45:35.160 | maybe structure, maybe something.

00:45:37.440 | I cannot formulate.

00:45:38.720 | I just was able to find symmetries,

00:45:41.840 | so degree of symmetries.

00:45:43.680 | - That's really good.

00:45:44.520 | So this is a good line.

00:45:46.400 | I feel like I understand the basic elements

00:45:50.600 | of what makes a good hand recognition system my own.

00:45:54.320 | Like symmetry connects with me.

00:45:56.440 | It seems like that's a very powerful predicate.

00:45:59.160 | My question is, is there a lot more going on

00:46:02.400 | that we're not able to introspect?

00:46:04.500 | Maybe I need to be able to understand

00:46:09.640 | a huge amount in the world of ideas,

00:46:13.080 | thousands of predicates, millions of predicates,

00:46:18.440 | in order to do hand recognition.

00:46:20.600 | - I don't think so.

00:46:21.560 | - So you're--

00:46:24.840 | - Both your hope and your intuition

00:46:26.560 | are such that-- - No, let me explain.

00:46:28.960 | You're using digits, you're using examples as well.

00:46:33.500 | Theory says that if you will use

00:46:37.640 | all possible functions

00:46:42.480 | from Hilbert space, all possible predicate,

00:46:46.320 | you don't need training data.

00:46:47.960 | You just will have admissible set of functions

00:46:53.800 | which contain one function.

00:46:55.200 | - Yes.

00:46:57.120 | So the trade-off is when you're not using all predicates,

00:47:01.120 | you're only using a few good predicates,

00:47:02.960 | you need to have some training data.

00:47:05.000 | - Yes, exactly.

00:47:06.760 | - The more good predicates you have,

00:47:08.440 | the less training data you need.

00:47:09.680 | - Exactly.

00:47:10.960 | That is intelligent learning.

00:47:13.280 | - Still, okay.

00:47:14.720 | I'm gonna keep asking the same dumb question,

00:47:17.400 | handwritten recognition.

00:47:19.120 | To solve the challenge, you kind of propose a challenge

00:47:21.560 | that says we should be able to get state-of-the-art

00:47:24.640 | MNIST error rates by using very few,

00:47:28.800 | 60, maybe fewer examples per digit.

00:47:31.520 | What kind of predicates do you think you'll--

00:47:35.720 | - That is the challenge.

00:47:37.560 | So people who will solve this problem--

00:47:39.840 | - They will answer.

00:47:40.680 | - They will answer.

00:47:41.520 | - Do you think they'll be able to answer it

00:47:44.780 | in a human explainable way?

00:47:46.580 | - They just need to write function, that's it.

00:47:50.820 | - But, so can that function be written, I guess,

00:47:54.320 | by an automated reasoning system?

00:47:58.740 | Whether we're talking about a neural network

00:48:01.120 | learning a particular function, or another mechanism?

00:48:05.080 | - No, I'm not against neural network.

00:48:08.560 | I'm against admissible set of function

00:48:11.600 | which create neural network.

00:48:13.720 | You did it by hand.

00:48:15.240 | You don't do it by invariance,

00:48:19.880 | by predicate, by reason.

00:48:23.360 | - But neural networks can then reverse,

00:48:26.380 | do the reverse step of helping you find a function.

00:48:29.940 | Just, the task of a neural network

00:48:33.600 | is to find a disentangled representation, for example,

00:48:38.180 | that they call, is to find that one predicate function

00:48:42.100 | that really captures some kind of essence.

00:48:45.180 | One, not the entire essence,

00:48:46.860 | but one very useful essence of this particular visual space.

00:48:51.860 | Do you think that's possible?

00:48:54.060 | Listen, I'm grasping, hoping there's an automated way

00:48:58.620 | to find good predicates.

00:49:00.300 | So the question is, what are the mechanisms

00:49:02.980 | of finding good predicates, ideas,

00:49:05.740 | that you think we should pursue?

00:49:08.020 | A young grad student listening right now.

00:49:10.020 | - I gave example.

00:49:13.420 | So find situation where predicate,

00:49:18.420 | which you're suggesting, don't create invariant.

00:49:25.000 | It's like in physics.

00:49:28.820 | Find situation where existing theory cannot explain it.

00:49:33.820 | - Find situation where the existing theory

00:49:39.380 | can't explain it. - Theory cannot explain

00:49:40.700 | this situation. - So you're finding

00:49:41.580 | contradictions.

00:49:42.780 | Find contradiction, and then remove this contradiction.

00:49:46.140 | But in my case, what means contradiction,

00:49:48.940 | you find function, which, if you will use this function,

00:49:53.500 | you're not keeping invariants.

00:49:55.060 | - So it's really the process of discovering contradictions.

00:50:01.300 | - Yeah.

00:50:02.140 | It is like in physics.

00:50:05.900 | Find situation where you have contradiction

00:50:09.820 | for one of the property.

00:50:12.960 | For one of the predicate.

00:50:15.520 | Then include this predicate, making invariants,

00:50:19.040 | and solve again this problem.

00:50:20.480 | Now you don't have contradiction.

00:50:22.120 | But it is not the best way, probably, I don't know,

00:50:28.320 | to looking for predicate.

00:50:32.000 | - It's just one way, okay.

00:50:33.600 | - That, no, no, it is brute force way.

00:50:35.920 | - The brute force way.

00:50:37.320 | What about the ideas of, what,

00:50:42.280 | big umbrella term of symbolic AI.

00:50:44.680 | There's what, in the '80s, with expert systems,

00:50:48.520 | sort of logic, reasoning-based systems.

00:50:51.400 | Is there hope there to find some,

00:50:55.720 | through sort of deductive reasoning,

00:51:00.480 | to find good predicates?

00:51:04.440 | - I don't think so.

00:51:07.680 | I think that just logic is not enough.

00:51:12.000 | - It's kind of a compelling notion, though.

00:51:14.400 | You know, that when smart people sit in a room

00:51:17.600 | and reason through things, it seems compelling.

00:51:20.360 | And making our machines do the same is also compelling.

00:51:23.540 | - So everything is very simple.

00:51:27.820 | When you have infinite number of predicate,

00:51:34.080 | you can choose the function you want.

00:51:38.600 | You have invariants, and you can choose the function you want.

00:51:42.540 | But you have to have not too many invariants

00:51:47.540 | to solve the problem.

00:51:53.000 | So, and how from infinite number of function,

00:51:59.940 | to select finite number, and hopefully small number

00:52:04.940 | of functions, which is good enough

00:52:11.080 | to extract small set of admissible functions.

00:52:16.680 | So they will be admissible, it's for sure,

00:52:19.800 | because every function just decrease set of function

00:52:23.880 | and leaving it admissible.

00:52:25.680 | But it will be small.

00:52:27.720 | - But why do you think logic-based systems can't help?

00:52:32.720 | Intuition, not--

00:52:35.280 | - Because you should know reality.

00:52:37.800 | You should know life.

00:52:39.480 | This guy like Propp, he knows something.

00:52:44.280 | And he tried to put in invariant his understanding.

00:52:49.280 | - But that's the human, yeah, but see,

00:52:51.560 | you're putting too much value into

00:52:54.460 | Vladimir Propp's knowing something.

00:52:57.900 | - No, it is--

00:52:59.460 | - I'm minding the subject.

00:53:01.100 | - What means you know life?

00:53:02.900 | What it means?

00:53:05.380 | - You know common sense.

00:53:07.020 | - No, no.

00:53:08.380 | You know something.

00:53:10.380 | Common sense, it is some rules.

00:53:13.420 | - You think so?

00:53:14.820 | Common sense is simply rules?

00:53:17.180 | Common sense is everything, it's mortality,

00:53:21.820 | it's fear of death, it's love, it's spirituality,

00:53:26.820 | it's happiness and sadness.

00:53:30.820 | All of it is tied up into understanding gravity,

00:53:34.420 | which is what we think of as common sense.

00:53:36.860 | - I don't really to discuss so wide.

00:53:39.820 | I want to discuss, understand,

00:53:42.400 | digital recognition.

00:53:45.420 | - Any time I bring up love and death,

00:53:47.660 | you bring it back to digital recognition.

00:53:50.460 | - Yeah, no, you know, it is durable

00:53:52.980 | because there is a challenge,

00:53:54.900 | which I see how to solve it.

00:53:59.260 | If I will have a student concentrate on this work,

00:54:02.500 | I will suggest something to solve.

00:54:04.780 | - You mean handwritten recognition?

00:54:06.860 | Yeah, it's a beautifully simple, elegant, and yet--

00:54:10.780 | - I think that I know invariants which will solve this.

00:54:13.440 | - You do?

00:54:14.280 | - I think so, yes.

00:54:15.940 | But it is not universal, it is maybe,

00:54:20.940 | I want some universal invariants which are good

00:54:25.060 | not only for digital recognition, for image understanding.

00:54:28.540 | - So let me ask, how hard do you think

00:54:34.180 | is 2D image understanding?

00:54:37.100 | So if we can kind of intuit handwritten recognition,

00:54:43.820 | how big of a step, leap, journey is it from that?

00:54:48.820 | If I gave you good, if I solved your challenge

00:54:51.980 | for handwritten recognition,

00:54:53.620 | how long would my journey then be from that

00:54:56.520 | to understanding more general natural images?

00:54:59.380 | - Immediately, you will understand this

00:55:01.940 | as soon as you will make a record.

00:55:04.060 | Because it is not for free.

00:55:07.740 | As soon as you will create several invariants

00:55:13.020 | which will help you to get the same performance

00:55:18.020 | that the best neural net did,

00:55:22.820 | using 100 times, maybe more than 100 times less examples,

00:55:27.820 | you have to have something smart to do that.

00:55:31.260 | - And you're saying--

00:55:32.260 | - That is invariant, it is predicate.

00:55:35.220 | Because you should put some idea how to do that.

00:55:39.460 | But okay, let me just pause.

00:55:42.380 | Maybe it's a trivial point, maybe not.

00:55:44.500 | But handwritten recognition feels like a 2D,

00:55:48.820 | two-dimensional problem.

00:55:50.440 | And it seems like how much complicated is the fact

00:55:55.340 | that most images are a projection

00:55:58.020 | of a three-dimensional world onto a 2D plane.

00:56:03.020 | It feels like for a three-dimensional world,

00:56:05.900 | we need to start understanding common sense

00:56:08.660 | in order to understand an image.

00:56:10.920 | It's no longer visual shape and symmetry.

00:56:16.980 | It's having to start to understand concepts,

00:56:20.740 | understand life.

00:56:22.100 | - Yeah.

00:56:22.940 | You're talking that there are different invariants.

00:56:27.300 | Different predicates, yeah.

00:56:28.900 | - And potentially much larger number.

00:56:32.500 | - You know, maybe.

00:56:34.340 | But let's start from simple.

00:56:36.340 | - Well, yeah, but you said that it would be--

00:56:38.020 | - But you know, I cannot think about things

00:56:41.420 | which I don't understand.

00:56:43.300 | This I understand.

00:56:44.820 | But I'm sure that I don't understand everything there.

00:56:48.460 | - Yeah, that's the difference.

00:56:49.300 | - It's like in staying, say, do as simple as possible,

00:56:53.140 | but not simpler.

00:56:54.380 | And that is exact case.

00:56:56.560 | - With handwritten recognition.

00:56:57.400 | - With handwritten.

00:56:58.980 | - Yeah, but never, that's the difference between you and I.

00:57:04.940 | I welcome and enjoy thinking about things

00:57:07.940 | I completely don't understand.

00:57:09.900 | Because to me, it's a natural extension

00:57:12.380 | without having solved handwritten recognition

00:57:15.140 | to wonder how difficult is the next step

00:57:20.140 | of understanding 2D, 3D images.

00:57:25.680 | Because ultimately, while the science of intelligence

00:57:29.260 | is fascinating, it's also fascinating to see

00:57:31.700 | how that maps to the engineering of intelligence.

00:57:34.700 | And recognizing handwritten digits is not,

00:57:39.340 | doesn't help you, it might, it may not help you

00:57:43.100 | with the problem of general intelligence.

00:57:46.560 | We don't know.

00:57:47.400 | It'll help you a little bit, we don't know how much.

00:57:49.500 | - It's unclear.

00:57:50.340 | - It's unclear.

00:57:51.160 | - Yeah.

00:57:52.000 | - It might very much.

00:57:52.840 | - But I would like to make a remark.

00:57:53.660 | - Yes.

00:57:54.500 | - I start not from very primitive problem,

00:57:58.780 | make a challenge problem.

00:58:03.100 | I start with very general problem, with Plato.

00:58:06.800 | So you understand, and it comes from Plato

00:58:10.660 | to digit recognition.

00:58:13.660 | So--

00:58:14.500 | - So you basically took Plato and the world of forms

00:58:19.140 | and ideas and mapped and projected into the clearest,

00:58:23.900 | simplest formulation of that big world.

00:58:26.820 | - You know, I would say that I did not understand Plato

00:58:31.540 | until recently, and until I consider weak convergence

00:58:36.540 | and then predicate and then, oh, this is what Plato taught.

00:58:43.380 | - So--

00:58:46.300 | - Can you linger on that?

00:58:47.120 | Like why, how do you think about this world of ideas

00:58:50.180 | and world of things in Plato?

00:58:51.980 | - No, it is metaphor, it is--

00:58:54.860 | - It's a metaphor for sure.

00:58:55.820 | - Yeah.

00:58:56.660 | - It's a poetic and a beautiful metaphor.

00:58:57.820 | - Yeah, yeah, yeah.

00:58:58.740 | - But what, can you--

00:59:00.540 | - But it is a way how you should try to understand

00:59:04.980 | how attack ideas in the world.

00:59:07.900 | So from my point of view, it is very clear,

00:59:12.900 | but it is lying.

00:59:14.900 | All the time people looking for that.

00:59:17.540 | Say, Plato's and Hegel, whatever reasonable it exists,

00:59:22.540 | whatever exists, it is reasonable.

00:59:26.700 | I don't know what he have in mind, reasonable.

00:59:30.240 | - Right, there's philosophers again.

00:59:31.580 | - No, no, no, no, no, no, no, no.

00:59:33.300 | It is next stop of Wigner, that mathematics

00:59:38.100 | understand something of reality.

00:59:40.740 | It is the same Plato line.

00:59:42.440 | And then it comes suddenly to Vladimir Propp.

00:59:47.100 | Look, 31 ideas, 31 units, and describes everything.

00:59:52.880 | - There's abstractions, ideas that represent our world.

01:00:00.160 | And we should always try to reach into that.

01:00:03.320 | - Yeah, but you should make a projection on reality.

01:00:07.520 | But understanding is, it is abstract ideas.

01:00:11.820 | You have in your mind several abstract ideas

01:00:15.880 | which you can apply to reality.

01:00:17.800 | - And reality in this case,

01:00:19.160 | so if you look at machine learning, is data.

01:00:21.400 | - It's example, data.

01:00:22.720 | - Data.

01:00:24.080 | Okay, let me put this on you,

01:00:26.280 | because I'm an emotional creature.

01:00:28.360 | I'm not a mathematical creature like you.

01:00:30.780 | I find compelling the idea,

01:00:33.400 | forget the space, the sea of functions.

01:00:36.660 | There's also a sea of data in the world.

01:00:39.520 | And I find compelling that there might be,

01:00:42.280 | like you said, teacher, small examples of data

01:00:47.280 | that are most useful for discovering good,

01:00:52.620 | whether it's predicates or good functions,

01:00:55.540 | that the selection of data may be a powerful journey,

01:01:00.300 | a useful, you know, coming up with a mechanism

01:01:03.740 | for selecting good data might be useful too.

01:01:06.460 | Do you find this idea of finding the right data set

01:01:12.420 | interesting at all?

01:01:13.960 | Or do you kind of take the data set as a given?

01:01:16.680 | - I think that it is, you know, my scheme is very simple.

01:01:22.620 | You have huge set of functions.

01:01:25.880 | If you will apply, and you have not too many data,

01:01:30.880 | if you pick up function which describes this data,

01:01:36.480 | you will do not very well.

01:01:39.940 | - Like randomly pick up?

01:01:42.240 | - Yeah, you will overfit, it will be overfitting.

01:01:45.440 | So you should decrease set of function

01:01:50.160 | from which you're picking up one.

01:01:53.660 | So you should go somehow to admissible set of function.

01:01:58.080 | And this, what about weak conversions?

01:02:02.360 | So but, from another point of view,

01:02:07.240 | to make admissible set of function,

01:02:13.200 | you need just a data, just function

01:02:15.320 | which you will take in inner product,

01:02:19.400 | which you will measure property of your function.

01:02:24.400 | And that is how it works.

01:02:31.180 | - No, I get it, I get it, I understand it,

01:02:32.740 | but do you, the reality is--

01:02:34.980 | - But let's think about examples.

01:02:39.140 | You have huge set of function,

01:02:41.860 | and you have several examples.

01:02:44.640 | If you just trying to take function

01:02:49.640 | which satisfies these examples,

01:02:52.580 | you still will overfit.

01:02:55.620 | You need decrease, you need admissible set of function.

01:02:59.220 | - Yeah, absolutely.

01:03:00.160 | But what, say you have more data than functions.

01:03:05.060 | So sort of consider the, I mean,

01:03:08.300 | maybe not more data than functions,

01:03:09.800 | 'cause that's-- - It's impossible.

01:03:11.300 | - Impossible.

01:03:12.140 | But what, I was trying to be poetic for a second.

01:03:15.160 | I mean, you have a huge amount of data,

01:03:17.200 | a huge amount of examples.

01:03:19.880 | - But amount of function can be even--

01:03:22.400 | - It can get bigger, I understand.

01:03:24.360 | - Everything can--

01:03:25.520 | - There's always a bigger boat.

01:03:27.560 | - Full Hilbert space.

01:03:29.280 | - I gotcha.

01:03:30.260 | But okay.

01:03:31.840 | But you don't find the world of data

01:03:35.840 | to be an interesting optimization space.

01:03:38.760 | Like the optimization should be in the space of functions.

01:03:42.280 | - In creating admissible set of function.

01:03:46.980 | - Admissible set of function.

01:03:48.140 | - No, you know, even from the classical basis theory,

01:03:52.440 | from structure risk minimization,

01:03:56.380 | you should organize function in the way

01:04:01.380 | that they will be useful for you.

01:04:06.540 | - Right.

01:04:07.540 | - And that is admissible set.

01:04:10.300 | - The way you're thinking about useful

01:04:12.620 | is you're given a small set of example.

01:04:16.940 | - Useful small.

01:04:17.820 | Small set of function which contain function by looking for.

01:04:21.820 | - Yeah, but looking for based on

01:04:25.300 | the empirical set of small examples.

01:04:27.620 | - Yeah, but that is another story, I don't touch it.

01:04:31.180 | Because I believe that this small examples

01:04:35.740 | is not too small.

01:04:37.380 | So 60 per class, law of large numbers works.

01:04:41.380 | I don't need uniform law.

01:04:43.380 | The story is that in statistics there are two law.

01:04:46.740 | Law of large numbers and uniform law of large numbers.

01:04:51.100 | So I want to be in situation where I use law

01:04:55.060 | of large numbers but not uniform law of large numbers.

01:04:58.260 | - Right, so 60 is law of large, it's large enough.

01:05:01.420 | - I hope, no, it still need some evaluation,

01:05:05.580 | some bounds, so it's, but the idea is the following.

01:05:10.060 | If you trust that, say, this average gives you

01:05:16.580 | something close to expectation,

01:05:21.020 | so you can talk about that, about this predicate.

01:05:26.020 | And that is basis of human intelligence.

01:05:29.800 | - Good predicates is the, the discovery of good predicates

01:05:33.740 | is the basis of human intelligence.

01:05:34.580 | - No, no, it is discovery of your understanding world.

01:05:39.580 | Of your methodology of understanding world.

01:05:43.560 | Because you have several function

01:05:47.260 | which you will apply to reality.

01:05:49.080 | - Can you say that again?

01:05:52.500 | So you're--

01:05:54.420 | - You have several functions, predicate.

01:05:57.560 | But they're abstract.

01:05:59.900 | Then you will apply them to reality, to your data.

01:06:04.340 | And you will create in this way predicate.

01:06:07.420 | Which is useful for your task.

01:06:09.660 | But predicate are not related specifically to your task,

01:06:16.420 | to this task, it is abstract functions.

01:06:20.100 | Which being applied to--

01:06:23.260 | - Many tasks that you might be interested in.

01:06:25.260 | - It might be many tasks, I don't know.

01:06:27.660 | - Well--

01:06:28.660 | - Different tasks.

01:06:29.940 | - Well they should be many tasks, right?

01:06:31.660 | - I believe like, like in probe case.

01:06:35.680 | It was for fairy tales, but it's happened everywhere.

01:06:38.540 | - Okay, so we talked about images a little bit,

01:06:42.180 | but can we talk about Noam Chomsky for a second?

01:06:45.780 | (laughing)

01:06:49.020 | - I believe I don't know him very well.

01:06:54.220 | - Personally, well--

01:06:55.660 | - Not personally, I don't know his ideas.

01:06:58.260 | - Well let me just say, do you think language,

01:07:01.020 | human language, is essential to expressing ideas,

01:07:05.780 | as Noam Chomsky believes?

01:07:08.340 | So like, language is at the core

01:07:10.140 | of our formation of predicates.

01:07:12.920 | It's like human language--

01:07:14.940 | - For me, language, and all the story of language,

01:07:18.580 | is very complicated.

01:07:20.740 | I don't understand this, and I'm not, I thought about--

01:07:25.740 | - Nobody does.

01:07:26.560 | - I'm not ready to work on that, because it's so huge.

01:07:30.780 | It is not for me, and I believe not for our century.

01:07:34.240 | - The 21st century.

01:07:37.340 | - Not for 21st century.

01:07:39.180 | - So--

01:07:40.020 | - We should learn something, a lot of stuff,

01:07:42.180 | from simple task, like digit recognition.

01:07:45.100 | - So you think, okay, you think digital recognition,

01:07:49.260 | 2D image, how would you more abstractly define

01:07:54.260 | digit recognition?

01:07:56.460 | It's 2D image, symbol recognition, essentially?

01:08:01.460 | I mean, I'm trying to get a sense,

01:08:08.100 | sort of thinking about it now,

01:08:09.700 | having worked with MNIST forever,

01:08:12.880 | how small of a subset is this,

01:08:16.020 | of the general vision recognition problem,

01:08:18.580 | and the general intelligence problem?

01:08:20.460 | Is it, yeah, is it a giant subset?

01:08:26.340 | Is it not?

01:08:27.820 | And how far away is language?

01:08:30.220 | - You know, let me refer to Einstein.

01:08:33.420 | Take the simplest problem, as simple as possible,

01:08:38.300 | but not simpler, and this is challenge,

01:08:41.780 | is simple problem, but it's simple by idea,

01:08:46.780 | but not simple to get it.

01:08:50.360 | When you will do this, you will find some predicate,

01:08:55.900 | which helps you to do it.

01:08:57.180 | - Well, yeah, I mean, with Einstein,

01:08:59.420 | you can, you look at general relativity,

01:09:04.140 | but that doesn't help you with quantum mechanics.

01:09:06.580 | - That's another story,

01:09:08.740 | you don't have any universal instrument.

01:09:11.840 | - Yeah, so I'm trying to wonder if,

01:09:15.380 | which space we're in, whether the,

01:09:17.540 | whether handwritten recognition is like general relativity,

01:09:21.140 | and then language is like quantum mechanics,

01:09:23.140 | so you're still gonna have to do a lot of mess

01:09:26.940 | to universalize it, but I'm trying to see,

01:09:31.940 | so what's your intuition why handwritten recognition

01:09:39.140 | is easier than language?

01:09:40.900 | Just, I think a lot of people would agree with that,

01:09:45.300 | but if you could elucidate sort of the intuition of why.

01:09:51.780 | - I don't, no, no, I don't think in this direction.

01:09:56.460 | I just think in the direction that this is problem,

01:09:59.560 | which if you will solve it well,

01:10:05.140 | we will create some abstract understanding of images.

01:10:12.740 | Maybe not all images.

01:10:19.700 | I would like to talk to guys who doing Unreal images

01:10:24.020 | in Columbia University.

01:10:26.260 | - What kind of images, Unreal?

01:10:28.420 | - Unreal images. - Real images.

01:10:29.820 | - Yeah, what their idea is,

01:10:32.340 | the real predicate, what can be predicate.

01:10:35.140 | I still, symmetry will play a role in real life images,

01:10:40.140 | in any real life images, 2D images,

01:10:43.900 | let's talk about 2D images.

01:10:46.320 | Because that's what we know.

01:10:51.320 | A neural network was created for 2D images.

01:10:55.940 | - So the people I know in vision science, for example,

01:10:58.660 | the people who study human vision,

01:11:01.000 | that they usually go to the world of symbols

01:11:04.500 | and like handwritten recognition,

01:11:06.360 | but not really, it's other kinds of symbols

01:11:08.460 | to study our visual perception system.

01:11:11.560 | As far as I know, not much predicate type of thinking

01:11:15.180 | is understood about our vision system.

01:11:17.620 | - They did not think in this direction.

01:11:19.420 | - They don't, yeah, but how do you even begin

01:11:21.740 | to think in that direction?

01:11:23.500 | - That's, I would like to discuss with them.

01:11:26.900 | - Yeah.

01:11:27.740 | - Because if we will be able to show that it is worth working

01:11:32.740 | and theoretical scheme, it's not so bad.

01:11:40.340 | - So the unfortunate, so if we compare to language,

01:11:43.340 | language is like letters, a finite set of letters

01:11:46.520 | and a finite set of ways you can put together those letters,

01:11:50.500 | so it feels more amenable to kind of analysis.

01:11:53.720 | With natural images, there is so many pixels.

01:11:58.680 | - No, no, no, letter, language is much,

01:12:02.020 | much more complicated.

01:12:03.660 | It's involved a lot of different stuff.

01:12:08.020 | It's not just understanding of very simple class of tasks.

01:12:14.020 | I would like to see lists of tasks with language involved.

01:12:19.020 | - Yes, so there's a lot of nice benchmarks now

01:12:23.220 | in natural language processing from the very trivial,

01:12:26.480 | like understanding the elements of a sentence

01:12:30.180 | to question answering to much more complicated

01:12:33.060 | where you talk about open domain dialogue.

01:12:36.100 | The natural question is with handwritten recognition,

01:12:39.240 | it's really the first step of understanding

01:12:42.960 | visual information.

01:12:44.600 | - Right, but even our records show that we go

01:12:49.600 | in the wrong direction because we need 60,000 digits.

01:12:56.580 | - So even this first step, so forget about talking

01:12:59.660 | about the full journey, this first step should be taken

01:13:02.580 | in the right direction.

01:13:03.420 | - No, no, in the wrong direction

01:13:04.540 | because 60,000 is unacceptable.

01:13:07.180 | - No, I'm saying it should be taken in the right direction

01:13:11.020 | because 60,000 is not acceptable.

01:13:13.660 | - If you can talk, it's great, we have half percent of error.

01:13:18.480 | - And hopefully the step from doing hand recognition

01:13:22.760 | using very few examples, the step towards what babies do

01:13:26.840 | when they crawl and understand their physical environment.

01:13:29.240 | - I don't know what babies do.

01:13:30.200 | - I know you don't know about babies.

01:13:31.760 | - If you will do from very small examples,

01:13:36.080 | you will find principles which are different

01:13:40.560 | from what we're using now.

01:13:43.080 | And theoretically it's more or less clear.

01:13:48.360 | That means that you will use weak convergence,

01:13:52.280 | not just strong convergence.

01:13:54.480 | - Do you think these principles will naturally

01:13:59.280 | be human interpretable?

01:14:01.680 | - Oh yeah.

01:14:02.560 | - So like when we'll be able to explain them

01:14:04.480 | and have a nice presentation to show

01:14:06.240 | what those principles are?

01:14:07.600 | Or are they going to be very kind of abstract

01:14:12.600 | kinds of functions?

01:14:14.440 | - For example, I talked yesterday about symmetry.

01:14:17.680 | - Yes.

01:14:18.720 | - And I gave very simple examples.

01:14:20.440 | The same will be like that.

01:14:22.040 | - You gave like a predicate of a basic for--

01:14:24.680 | - For symmetries.

01:14:25.760 | - Yes, for different symmetries and you have for--

01:14:29.560 | - Degree of symmetry, that is important, not just symmetry.

01:14:33.680 | Existence doesn't exist, degree of symmetry.

01:14:37.280 | - Yeah, for handwritten recognition.

01:14:40.240 | - No, it's not for handwritten, it's for images.

01:14:45.160 | But I would like apply to handwritten.

01:14:47.720 | - Right, in theory it's more general.

01:14:49.760 | Okay, okay.

01:14:50.920 | So a lot of the things we've been talking about falls,

01:14:59.800 | we've been talking about philosophy a little bit,

01:15:01.840 | but also about mathematics and statistics.

01:15:05.520 | A lot of it falls into this idea,

01:15:08.080 | a universal idea of statistical theory of learning.

01:15:10.740 | What is the most beautiful and sort of powerful

01:15:16.800 | or essential idea you've come across,

01:15:19.120 | even just for yourself personally,

01:15:20.800 | in the world of statistics or statistic theory of learning?

01:15:25.480 | - Probably uniform convergence, which we did

01:15:29.520 | with Alexei Cherevonenkis.

01:15:33.000 | - Can you describe universal convergence?

01:15:36.040 | - You have law of large numbers.

01:15:38.980 | So for any function, expectation of function,

01:15:44.480 | average of function, converged expectation.

01:15:48.120 | But if you have set of functions,

01:15:50.520 | for any function it is true.

01:15:52.340 | But it should converge simultaneously

01:15:55.560 | for all set of functions.

01:15:57.500 | And for learning, you need,

01:16:04.960 | uniform convergence, just convergence is not enough.

01:16:08.540 | Because when you pick up one which gives minimum,

01:16:15.680 | you can pick up one function which does not converge

01:16:21.660 | and it will give you the best answer for this function.

01:16:28.020 | So you need uniform convergence to guarantee learning.

01:16:34.920 | So learning does not rely on trivial law of large numbers,

01:16:39.920 | it rely on universal.

01:16:42.060 | But idea of the convergence exists

01:16:47.960 | in statistics for a long time.

01:16:50.680 | But it is interesting that,

01:16:56.860 | as I think about myself, how stupid I was 50 years,

01:17:04.920 | I did not see weak convergence.

01:17:07.320 | I work only on strong convergence.

01:17:10.960 | But now I think that most powerful is weak convergence.

01:17:15.280 | Because it makes admissible set of functions.

01:17:18.880 | And even in all proverbs,

01:17:22.720 | when people try to understand recognition

01:17:26.440 | about dog law, looks like a dog and so on,

01:17:30.280 | they use weak convergence.

01:17:32.400 | People in language, they understand this.

01:17:34.600 | But when we're trying to create artificial intelligence,

01:17:40.840 | we want to invent in different way.

01:17:45.080 | We just consider strong convergence.

01:17:48.780 | - So reducing the set of admissible functions,

01:17:52.720 | you think there should be effort put into

01:17:57.720 | understanding the properties of weak convergence?

01:18:01.280 | - You know, in classical mathematics,

01:18:04.760 | in Gilbert space, there are only two ways,

01:18:08.800 | two forms of convergence, strong and weak.

01:18:12.120 | Now we can use both.

01:18:15.760 | That means that we did everything.

01:18:19.600 | And it so happened, when we use Hilbert space,

01:18:27.800 | which is very rich space, space of continuous functions,

01:18:32.000 | which has an integral and square.

01:18:36.880 | So we can apply weak and strong convergence for learning

01:18:42.400 | and have closed form solution.

01:18:44.200 | So for computationally simple.

01:18:47.680 | For me, it is sign that it is right way.

01:18:51.080 | Because you don't need any heuristic here,

01:18:55.760 | yes, whatever you want.

01:18:57.720 | But now, the only what left,

01:19:02.520 | it is concept of what is predicate.

01:19:04.720 | - Of predicate.

01:19:05.560 | - But it is not statistics.

01:19:08.000 | - By the way, I like the fact that you think

01:19:09.760 | that heuristics are a mess that should be removed

01:19:13.280 | from the system.

01:19:14.920 | So closed form solution is the ultimate--

01:19:18.480 | - No, it so happened, that when you're using

01:19:20.840 | right instrument, you have closed form solution.

01:19:26.280 | - Do you think intelligence, human level intelligence,

01:19:31.280 | when we create it, will have something

01:19:35.760 | like a closed form solution?

01:19:41.400 | - You know, now I'm looking on bones,

01:19:46.400 | which I gave bones for convergence.

01:19:49.560 | And when I looking for bones,

01:19:53.880 | I thinking what is the most appropriate kernel

01:19:58.880 | for this bone would be.

01:20:01.000 | So we know that in, say, all our businesses,

01:20:07.480 | we use radial basis function.

01:20:09.720 | But looking on the bone, I think that I start to understand

01:20:16.120 | that maybe we need to make corrections

01:20:18.800 | to radial basis function to be closer

01:20:23.640 | to work better for this bones.

01:20:28.440 | So I'm again trying to understand what type of kernel

01:20:32.560 | have best approximation,

01:20:37.560 | not an approximation, best fit to this bones.

01:20:42.560 | - Sure, so there's a lot of interesting work

01:20:45.600 | that could be done in discovering better function

01:20:47.840 | than radial basis functions for--

01:20:50.120 | - Yeah, but--

01:20:50.960 | - For the bones you find.

01:20:52.800 | - It still comes from, you're looking to mass

01:20:57.800 | and trying to understand what--

01:21:00.240 | - From your own mind, looking at the--

01:21:02.240 | - Yeah, but--

01:21:03.080 | - I don't know--

01:21:03.920 | - Then I trying to understand what will be good for that.

01:21:08.920 | - Yeah, but to me there's still a beauty,

01:21:14.000 | again, maybe I'm a descendant of valentorian,

01:21:16.280 | to heuristics.

01:21:18.000 | To me, ultimately, intelligence will be

01:21:20.880 | a mess of heuristics.

01:21:22.340 | And that's the engineering answer, I guess.

01:21:26.320 | - Absolutely.

01:21:27.480 | When you're doing, say, self-driving cars,

01:21:31.080 | the great guy who will do that.

01:21:35.040 | It does not matter what theory behind that.

01:21:38.640 | Who has a better feeling have to apply it.

01:21:43.800 | But by the way, it is the same story about predicate.

01:21:50.420 | Because you cannot create rule for,

01:21:53.880 | situation is much more than you have rule for that.

01:21:56.680 | But maybe you can have more abstract rule

01:22:03.520 | than it will be less than zero.

01:22:07.740 | It is the same story about ideas

01:22:10.800 | and ideas applied to specific cases.

01:22:15.140 | - But still you should--

01:22:17.360 | - You cannot avoid this.

01:22:18.920 | - Yes, of course, but you should still reach

01:22:20.880 | for the ideas to understand the science.

01:22:22.920 | - Let me kind of ask,

01:22:25.280 | do you think neural networks or functions

01:22:29.360 | can be made to reason?

01:22:32.660 | Sort of what do you think,

01:22:35.520 | been talking about intelligence,

01:22:37.120 | but this idea of reasoning.

01:22:39.640 | There's an element of sequentially disassembling,

01:22:44.540 | interpreting the images.

01:22:48.420 | So when you think of handwritten recognition,

01:22:51.860 | we kind of think that there'll be a single,

01:22:55.240 | there's an input and output.

01:22:56.920 | There's not a recurrence.

01:22:58.640 | - Yeah.

01:23:01.080 | - What do you think about sort of the idea of recurrence,

01:23:04.440 | of going back to memory and thinking through this sort of

01:23:07.480 | sequentially mangling the different representations

01:23:12.480 | over and over until you arrive at a conclusion?

01:23:17.940 | Or is ultimately all that can be wrapped up into a function?

01:23:22.940 | - You're suggesting that let us use this type of algorithm.

01:23:28.460 | When I starting thinking,

01:23:31.060 | I first of all starting to understand what I want.

01:23:35.180 | Can I write down what I want?

01:23:39.560 | And then I trying to formalize.

01:23:45.020 | And when I do that, I think I have to solve this problem.

01:23:49.260 | And

01:23:52.980 | till now I did not see a situation where--

01:24:02.380 | - You need recurrence.

01:24:03.700 | - Recurrence.

01:24:04.540 | - But do you observe human beings?

01:24:07.860 | - Yeah.

01:24:08.700 | - Do you try to, it's the imitation question, right?

01:24:12.420 | It seems that human beings reason

01:24:14.900 | this kind of sequentially sort of,

01:24:19.620 | does that inspire in you a thought that we need to add that

01:24:24.140 | into our intelligent systems?

01:24:29.000 | You're saying, okay, I mean, you've kind of answered saying

01:24:34.440 | until now I haven't seen a need for it.

01:24:37.040 | And so because of that,

01:24:38.500 | you don't see a reason to think about it.

01:24:41.900 | - No, most of things I don't understand.

01:24:44.980 | In reasoning, in human, it is for me too complicated.

01:24:50.860 | For me, the most difficult part is to ask questions,

01:24:57.740 | good questions, how it works,

01:25:03.900 | how people asking questions.

01:25:06.820 | I don't know this.

01:25:11.720 | - You said that machine learning's not only

01:25:13.640 | about technical things, speaking of questions,

01:25:16.500 | but it's also about philosophy.

01:25:18.220 | So what role does philosophy play in machine learning?

01:25:23.500 | We talked about Plato, but generally thinking

01:25:26.860 | in this philosophical way,

01:25:29.980 | does it have, how does philosophy and math

01:25:33.860 | fit together in your mind?

01:25:35.240 | - So studies and then their implementation.

01:25:39.500 | It's like predicate, like say admissible set of functions.

01:25:44.500 | It comes together, everything.

01:25:51.480 | Because the first iteration of theory

01:25:56.480 | was done 50 years ago, it all that, this is theory.

01:26:00.400 | So everything's there.

01:26:02.240 | If you have data, you can, and your set of function

01:26:08.080 | is not, has not big capacity.

01:26:13.080 | So low VC dimension, you can do that.

01:26:15.760 | You can make structural risk minimization, control capacity.

01:26:19.700 | But you was not able to make admissible

01:26:26.120 | set of function good.

01:26:27.980 | Now, when suddenly realize that we did not use

01:26:33.680 | another idea of convergence, which we can,

01:26:38.260 | everything comes together.

01:26:41.500 | - But those are mathematical notions.

01:26:43.340 | Philosophy plays a role of simply saying

01:26:48.020 | that we should be swimming in the space of ideas.

01:26:52.100 | - Let's talk what is philosophy.

01:26:54.320 | Philosophy means understanding of life.

01:26:56.860 | So understanding of life, say people like Plato,

01:27:03.500 | they understand on very high abstract level of life.

01:27:06.800 | So, and whatever I doing, it just implementation

01:27:12.660 | of my understanding of life.

01:27:15.660 | But every new step, it is very difficult.

01:27:21.360 | For example, to find this idea that we need

01:27:31.580 | big convergence was not simple for me.

01:27:36.580 | - So that required thinking about life a little bit.

01:27:43.260 | Hard to trace, but there was some thought process.

01:27:48.860 | - You know, I working, I thinking about the same problem

01:27:52.980 | for 50 years or more.

01:27:55.420 | And again and again and again.

01:28:00.020 | I trying to be honest and that is very important.

01:28:02.660 | Not to be very enthusiastic, but concentrate

01:28:06.340 | on whatever we was not able to achieve.

01:28:09.460 | - Patient.

01:28:10.300 | - Yeah.

01:28:11.140 | And understand why.

01:28:13.360 | And now I understand that because I believe in math,

01:28:18.900 | I believe that in Wigner's idea.

01:28:23.740 | But now when I see that there are only two way

01:28:28.740 | of convergence and we using both,

01:28:32.060 | that means that we must do as well as people doing.

01:28:37.940 | But now exactly in philosophy and what we know

01:28:44.340 | about predicate, how we understand life,

01:28:47.020 | can we describe as a predicate.

01:28:50.100 | I thought about that and that is more or less obvious.

01:28:57.820 | Level of symmetry.

01:28:59.020 | But next, I have a feeling it's something about structures.

01:29:05.740 | But I don't know how to formulate,

01:29:11.820 | how to measure measure of structure and all that stuff.

01:29:16.180 | And the guy who will solve this challenge problem,

01:29:21.180 | then when we will looking how he did it,

01:29:27.060 | probably just only symmetry is not enough.

01:29:30.340 | - But something like symmetry will be there.

01:29:34.180 | - Oh yeah, absolutely, symmetry will be there.

01:29:37.580 | Level of symmetry will be there.

01:29:39.260 | And level of symmetry, anti-symmetry,

01:29:43.020 | diagonal, vertical, I even don't know how you can use

01:29:48.020 | in different direction idea of symmetry,

01:29:50.660 | it's very general.

01:29:52.300 | But it will be there.

01:29:54.940 | I think that people are very sensitive to idea of symmetry.

01:29:58.580 | But there are several ideas like symmetry.

01:30:02.940 | As I would like to learn.

01:30:07.020 | But you cannot learn just thinking about that.

01:30:11.820 | You should do challenging problems and then analyze them,

01:30:15.500 | why it was able to solve them.

01:30:20.220 | And then you will see.

01:30:22.740 | Very simple things, it's not easy to find.

01:30:25.420 | Even with talking about this every time.

01:30:30.460 | I was surprised, I tried to understand.

01:30:36.340 | Is people describe in language strong convergence

01:30:41.340 | mechanism for learning?

01:30:43.260 | I did not see, I don't know.

01:30:46.660 | But weak convergence, this dark story,

01:30:50.100 | and story like that, when you will explain to kid,

01:30:54.700 | you will use weak convergence argument.

01:30:57.620 | It looks like it does like this.

01:30:59.420 | But when you try to formalize, you're just ignoring this.

01:31:05.820 | Why, why 50 years from start of machine learning?

01:31:10.140 | - And that's the role of philosophers.

01:31:11.580 | - I think that might be, I don't know.

01:31:18.300 | Maybe this is serious.

01:31:19.980 | We should blame for that because

01:31:23.660 | empirical risk minimization, and all this stuff.

01:31:27.740 | If you read now textbooks, they just about bound

01:31:32.500 | about empirical risk minimization.

01:31:34.380 | They don't look for another problem like admissible set.

01:31:39.380 | - But on the topic of life,

01:31:45.060 | perhaps we, you could talk in Russian for a little bit.

01:31:50.020 | What's your favorite memory from childhood?

01:31:53.260 | (speaking in foreign language)

01:31:57.700 | - Music.

01:31:58.540 | - How about, can you try to answer in Russian?

01:32:02.660 | (speaking in foreign language)

01:32:07.580 | (speaking in foreign language)

01:32:11.860 | (speaking in foreign language)

01:32:15.900 | (speaking in foreign language)

01:32:20.660 | (speaking in foreign language)

01:32:24.580 | (speaking in foreign language)

01:32:29.100 | (speaking in foreign language)

01:32:33.020 | (speaking in foreign language)

01:32:37.900 | (speaking in foreign language)

01:33:05.580 | (speaking in foreign language)

01:33:09.500 | Now that we're talking about Bach,

01:33:13.100 | let's switch back to English

01:33:15.700 | 'cause I like Beethoven and Chopin, so.

01:33:17.740 | - Chopin, it's another music story.

01:33:21.340 | - But Bach, if we talk about predicates,

01:33:23.980 | Bach probably has the most sort of

01:33:28.980 | well-defined predicates and the like.

01:33:31.500 | - You know, it is very interesting to read

01:33:35.260 | what critics writing about Bach,

01:33:38.740 | which words they're using.

01:33:40.460 | They're trying to describe predicates.

01:33:42.860 | And then Chopin, it is very different vocabulary,

01:33:50.820 | very different predicates.

01:33:55.140 | And I think that if you will make collection of that,

01:34:02.700 | so maybe from this you can describe predicates

01:34:05.860 | for digit recognition as well.

01:34:07.660 | - From Bach and Chopin.

01:34:10.420 | - No, no, no, not from Bach and Chopin.

01:34:12.500 | - From the critic interpretation of the music, yeah.

01:34:15.220 | - When they're trying to explain music,

01:34:18.620 | what they use, they describe high-level ideas

01:34:24.740 | of Plato's ideas, what behind this music.

01:34:28.860 | - That's brilliant.

01:34:29.700 | So art is not self-explanatory in some sense.

01:34:34.700 | So you have to try to convert it into ideas.

01:34:39.060 | - It is ill-posed problems.

01:34:40.980 | When you go from ideas to the representation,

01:34:45.980 | it is easy way.

01:34:47.580 | But when you're trying to go back,

01:34:49.580 | it is ill-posed problems.

01:34:51.420 | But nevertheless, I believe that when you're looking

01:34:55.860 | from that, even from art, you will be able to find

01:35:00.300 | predicates for digit recognition.

01:35:02.060 | - That's such a fascinating and powerful notion.

01:35:07.620 | Do you ponder your own mortality?

01:35:10.580 | Do you think about it, do you fear it,

01:35:13.620 | do you draw insight from it?

01:35:15.060 | - About mortality?

01:35:18.220 | No, yeah.

01:35:20.620 | - Are you afraid of death?

01:35:25.820 | - Not too much.

01:35:26.900 | Not too much.

01:35:29.660 | It is pity that I will not be able to do something

01:35:33.740 | which I think I have a feeling to do that.

01:35:38.740 | For example, I will be very happy to work with guys,

01:35:44.460 | theoretician from music, to write this collection

01:35:52.060 | of description, how they describe music,

01:35:55.060 | how they use the predicate.

01:35:56.940 | And from art as well, then take what is in common

01:36:01.940 | and try to understand predicate,

01:36:06.180 | which is absolute for everything.

01:36:08.700 | - And then use that for visual recognition,

01:36:10.500 | see if there is a connection.

01:36:12.620 | - Exactly.

01:36:13.580 | - Ada, there's still time, we got time.

01:36:15.580 | (laughing)

01:36:19.380 | We got time.

01:36:20.220 | - It takes years and years and years.

01:36:24.060 | - I think so.

01:36:25.060 | - It's a long way.

01:36:26.460 | - Well, see, you've got the patient mathematicians mind.

01:36:30.900 | I think it could be done very quickly and very beautifully.

01:36:34.060 | I think it's a really elegant idea.

01:36:35.820 | - Yeah, but also--

01:36:36.940 | - Some of many.

01:36:37.780 | - You know, the most time, it is not to make

01:36:41.900 | this collection, to understand what is common

01:36:46.260 | to think about that once again and again and again.

01:36:49.500 | - Again and again and again, but I think sometimes,

01:36:52.660 | especially just when you say this idea now,

01:36:55.700 | even just putting together the collection

01:36:58.780 | and looking at the different sets of data,

01:37:03.300 | language, trying to interpret music,

01:37:05.500 | criticize music, and images,

01:37:08.740 | I think there'll be sparks of ideas that'll come.

01:37:10.940 | Of course, again and again, you'll come up

01:37:12.660 | with better ideas, but even just that notion

01:37:15.820 | is a beautiful notion.

01:37:16.940 | - I even have some example.

01:37:21.580 | So I have friend who was specialist in Russian poetry.

01:37:26.580 | She is professor of Russian poetry.

01:37:35.260 | He did not write poems, but she know a lot of stuff.

01:37:40.260 | She make book, several books, and one of them

01:37:49.300 | is a collection of Russian poetry.

01:37:53.500 | She have images of Russian poetry.

01:37:57.100 | She collect all images of Russian poetry.

01:37:59.340 | And I ask her to do following.

01:38:03.420 | You have NIPS, digit recognition,

01:38:08.500 | and we get 100 digits, or maybe less than 100.

01:38:14.660 | I don't remember, maybe 50 digits.

01:38:18.860 | And try from poetical point of view,

01:38:21.660 | describe every image which she see,

01:38:25.260 | using only words of images of Russian poetry.

01:38:30.260 | And she did it.

01:38:32.220 | And then we tried to,

01:38:37.580 | I call it learning using privileged information.

01:38:43.620 | I call it privileged information.

01:38:45.900 | You have on two languages.

01:38:48.060 | One language is just image of digit,

01:38:53.060 | and another language poetic description of this image.

01:38:56.620 | And this is privileged information.

01:39:00.060 | And there is a algorithm when you're working

01:39:04.500 | using privileged information, you're doing better.

01:39:07.460 | Much better, so.

01:39:10.380 | - So there's something there.

01:39:11.580 | - Something there.

01:39:12.860 | And there is, and you see,

01:39:16.980 | she unfortunately died.

01:39:19.020 | The collection of digits in poetic descriptions

01:39:25.900 | of these digits.

01:39:27.260 | - So there's something there in that poetic description.

01:39:32.900 | - But I think that there is an abstract ideas

01:39:37.900 | on the plateau level of ideas.

01:39:40.700 | - Yeah, that they're there, that could be discovered.

01:39:43.140 | And music seems to be a good entry point.

01:39:45.060 | - As soon as we start this challenge problem.

01:39:50.060 | - The challenge problem.

01:39:51.180 | - It immediately connected to all this stuff.

01:39:55.420 | - Especially with your talk and this podcast,

01:39:58.060 | and I'll do whatever I can to advertise it.

01:40:00.100 | It's such a clean, beautiful Einstein-like formulation

01:40:03.260 | of the challenge before us.

01:40:05.220 | - Right.

01:40:06.060 | - Let me ask another absurd question.

01:40:09.500 | We talked about mortality.

01:40:12.780 | We talked about philosophy of life.

01:40:14.660 | What do you think is the meaning of life?

01:40:16.660 | What's the predicate for mysterious existence

01:40:22.540 | here on Earth?

01:40:23.980 | - I don't know.

01:40:30.580 | It's very interesting.

01:40:34.740 | We have in Russia, I don't know if you know,

01:40:39.980 | the guy Strugatsky.

01:40:43.100 | They are writing pictures, they're thinking about

01:40:47.860 | human, what's going on.

01:40:49.740 | And they have idea that there are,

01:40:56.660 | they're developing two type of people.

01:41:01.860 | Common people and very smart people.

01:41:05.100 | They just started.

01:41:06.100 | And these two branches of people

01:41:09.860 | will go in different direction very soon.

01:41:13.180 | So that's what they're thinking about.

01:41:15.940 | (laughing)

01:41:18.220 | - So the purpose of life is to create two paths.

01:41:23.220 | - Two paths.

01:41:24.660 | - Of human societies.

01:41:25.980 | - Yes.

01:41:27.020 | Simple people and more complicated people.

01:41:29.980 | - Which do you like best?

01:41:31.540 | The simple people or the complicated ones?

01:41:34.500 | - I don't know.

01:41:35.340 | That is just his fantasy.

01:41:38.260 | But you know, every week we have guy

01:41:41.700 | who is just writer and also

01:41:46.700 | so it's called literature.

01:41:50.820 | And he explain how he understand literature

01:41:56.580 | and human relationship, how he see life.

01:42:00.300 | And I understood that I'm just small kids

01:42:05.980 | comparing to him.

01:42:09.500 | He's very smart guy in understanding life.

01:42:12.620 | He knows this predicate, he knows big blocks of life.

01:42:18.860 | I amused every time when I listen to him.

01:42:23.300 | And he just talking about literature.

01:42:27.380 | And I think that I was surprised.

01:42:31.380 | So the managers in big companies,

01:42:39.180 | most of them are guys who study English language

01:42:44.180 | in English literature.

01:42:50.020 | So why?

01:42:52.500 | Because they understand life.

01:42:54.820 | They understand models.

01:42:57.020 | And among them, maybe many talented critics

01:43:01.700 | which just analyzing this.

01:43:06.660 | And this is big science like probe did.

01:43:10.500 | This is blocks.

01:43:12.380 | Yeah, very smart.

01:43:15.340 | - It amazes me that you are and continue to be humbled

01:43:21.500 | by the brilliance of others.

01:43:22.940 | - I'm very modest about myself.

01:43:25.540 | I see so smart guys around.

01:43:28.940 | - Well, let me be immodest for you.

01:43:31.740 | You're one of the greatest mathematicians,

01:43:33.900 | statisticians of our time.

01:43:35.820 | It's truly an honor.

01:43:36.980 | Thank you for talking again.

01:43:38.580 | And let's talk.

01:43:39.500 | - It is not.

01:43:42.140 | - Yeah, let's talk.

01:43:43.460 | - I know my limits.

01:43:44.860 | - Let's talk again when your challenge is taken on

01:43:49.140 | and solved by grad student.

01:43:51.860 | Especially-- - Let's talk again.

01:43:53.900 | - When they use it. - I hope this happens.

01:43:56.140 | - Maybe music will be involved.

01:43:58.900 | Vladimir, thank you so much.

01:43:59.900 | It's been an honor. - Thank you very much.

01:44:02.620 | - Thanks for listening to this conversation

01:44:04.220 | with Vladimir Vapnik.

01:44:05.540 | And thank you to our presenting sponsor, Cash App.

01:44:08.780 | Download it, use code LEXPODCAST.

01:44:11.420 | You'll get $10 and $10 will go to FIRST,

01:44:14.340 | an organization that inspires and educates young minds

01:44:17.060 | to become science and technology innovators of tomorrow.

01:44:20.740 | If you enjoy this podcast, subscribe on YouTube,

01:44:23.500 | give us five stars on Apple Podcast,

01:44:25.340 | support on Patreon, or simply connect with me on Twitter

01:44:28.820 | at Lex Friedman.

01:44:30.300 | And now let me leave you with some words

01:44:33.500 | from Vladimir Vapnik.

01:44:35.580 | When solving a problem of interest,

01:44:37.740 | do not solve a more general problem as an intermediate step.

01:44:41.660 | Thank you for listening.

01:44:44.340 | I hope to see you next time.

01:44:46.220 | (upbeat music)

01:44:48.820 | (upbeat music)

01:44:51.420 | [BLANK_AUDIO]

Vladimir Vapnik: Predicates, Invariants, and the Essence of Intelligence | Lex Fridman Podcast #71

Chapters