Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115

00:00:00.000 | The following is a conversation with Dalipe George,

00:00:03.440 | a researcher at the intersection of neuroscience

00:00:06.200 | and artificial intelligence,

00:00:07.960 | co-founder of Vicarious with Scott Phoenix,

00:00:11.000 | and formerly co-founder of Numenta with Jeff Hawkins,

00:00:14.800 | who's been on this podcast, and Donna Dubinsky.

00:00:18.880 | From his early work on hierarchical temporal memory

00:00:21.760 | to recursive cortical networks to today,

00:00:24.760 | Dalipe's always sought to engineer intelligence

00:00:27.640 | that is closely inspired by the human brain.

00:00:31.240 | As a side note, I think we understand very little

00:00:34.200 | about the fundamental principles

00:00:35.920 | underlying the function of the human brain,

00:00:38.920 | but the little we do know gives hints

00:00:41.200 | that may be more useful for engineering intelligence

00:00:43.680 | than any idea in mathematics, computer science, physics,

00:00:47.160 | and scientific fields outside of biology.

00:00:50.520 | And so the brain is a kind of existence proof

00:00:52.680 | that says it's possible.

00:00:54.520 | Keep at it.

00:00:56.320 | I should also say that brain-inspired AI is often overhyped

00:01:00.600 | and use this fodder, just as quantum computing

00:01:03.680 | for a marketing speak,

00:01:05.960 | but I'm not afraid of exploring these

00:01:08.080 | sometimes overhyped areas since where there's smoke,

00:01:11.560 | there's sometimes fire.

00:01:12.900 | Quick summary of the ads.

00:01:15.360 | Three sponsors, Babbel, Raycon Earbuds, and Masterclass.

00:01:19.960 | Please consider supporting this podcast

00:01:21.760 | by clicking the special links in the description

00:01:24.640 | to get the discount.

00:01:25.880 | It really is the best way to support this podcast.

00:01:29.160 | If you enjoy this thing, subscribe on YouTube,

00:01:31.560 | review it with the five stars on Apple Podcasts,

00:01:33.760 | support on Patreon, or connect with me on Twitter

00:01:36.560 | at Lex Friedman.

00:01:37.900 | As usual, I'll do a few minutes of ads now

00:01:41.180 | and never any ads in the middle

00:01:42.440 | that can break the flow of the conversation.

00:01:45.340 | This show is sponsored by Babbel,

00:01:47.500 | an app and website that gets you speaking

00:01:50.000 | in a new language within weeks.

00:01:52.040 | Go to babbel.com and use code Lex to get three months free.

00:01:55.740 | They offer 14 languages, including Spanish, French,

00:01:59.220 | Italian, German, and yes, Russian.

00:02:03.140 | Daily lessons are 10 to 15 minutes, super easy, effective,

00:02:07.040 | designed by over 100 language experts.

00:02:10.700 | Let me read a few lines from the Russian poem,

00:02:13.540 | "Noch, ulitza, fanar, apteka," by Alexander Bloch,

00:02:18.360 | that you'll start to understand if you sign up to Babbel.

00:02:22.140 | (speaking in foreign language)

00:02:26.060 | Now, I say that you'll only start to understand this poem

00:02:39.000 | because Russian starts with a language

00:02:41.560 | and ends with a vodka.

00:02:43.700 | Now, the latter part is definitely not endorsed

00:02:46.360 | or provided by Babbel

00:02:47.800 | and will probably lose me the sponsorship,

00:02:50.080 | but once you graduate from Babbel,

00:02:51.960 | you can enroll in my advanced course

00:02:53.600 | of late night Russian conversation over vodka.

00:02:56.500 | I have not yet developed an app for that.

00:02:59.140 | It's in progress.

00:03:00.660 | So get started by visiting babbel.com

00:03:02.940 | and use code Lex to get three months free.

00:03:05.560 | This show is sponsored by Raycon earbuds.

00:03:09.500 | Get them at buyraycon.com/lex.

00:03:12.580 | They've become my main method of listening to podcasts,

00:03:15.060 | audio books, and music when I run, do pushups and pull-ups,

00:03:18.820 | or just living life.

00:03:20.740 | In fact, I often listen to Brown Noise with them

00:03:23.300 | when I'm thinking deeply about something,

00:03:25.380 | it helps me focus.

00:03:27.060 | They're super comfortable, pair easily, great sound,

00:03:30.700 | great bass, six hours of playtime.

00:03:34.080 | I've been putting in a lot of miles to get ready

00:03:36.060 | for a potential ultra marathon

00:03:38.220 | and listening to audio books on World War II.

00:03:41.500 | The sound is rich and really comes in clear.

00:03:45.940 | So again, get them at buyraycon.com/lex.

00:03:50.140 | This show is sponsored by Masterclass.

00:03:52.780 | Sign up at masterclass.com/lex to get a discount

00:03:55.860 | and to support this podcast.

00:03:58.060 | When I first heard about Masterclass,

00:03:59.620 | I thought it was too good to be true.

00:04:01.620 | I still think it's too good to be true.

00:04:03.820 | For 180 bucks a year, you get an all access pass

00:04:06.760 | to watch courses from, to list some of my favorites,

00:04:10.140 | Chris Hadfield on space exploration,

00:04:12.460 | Neil deGrasse Tyson on scientific thinking

00:04:14.340 | and communication, Will Wright,

00:04:16.500 | creator of SimCity and Sims on game design.

00:04:19.380 | Every time I do this read,

00:04:21.180 | I really want to play a city builder game.

00:04:24.420 | Carlos Santana on guitar, Garry Kasparov on chess,

00:04:27.960 | Daniel Negrano on poker and many more.

00:04:30.780 | Chris Hadfield explaining how rockets work

00:04:33.500 | and the experience of being launched into space alone

00:04:36.200 | is worth the money.

00:04:37.540 | By the way, you can watch it on basically any device.

00:04:40.740 | Once again, sign up at masterclass.com to get a discount

00:04:43.700 | and to support this podcast.

00:04:45.900 | And now here's my conversation with Dilip George.

00:04:49.700 | Do you think we need to understand the brain

00:04:52.600 | in order to build it?

00:04:54.300 | - Yes, if you want to build the brain,

00:04:56.260 | we definitely need to understand how it works.

00:04:58.660 | So Blue Brain or Henry Markram's project

00:05:02.380 | is trying to build a brain without understanding it,

00:05:05.740 | like just trying to put details of the brain

00:05:10.700 | from neuroscience experiments into a giant simulation.

00:05:16.100 | By putting more and more neurons, more and more details.

00:05:19.060 | But that is not going to work

00:05:21.500 | because when it doesn't perform as what you expect it to do,

00:05:26.500 | then what do you do?

00:05:28.060 | You do, you just keep adding more details.

00:05:29.980 | How do you debug it?

00:05:30.940 | So unless you understand,

00:05:33.980 | unless you have a theory about

00:05:35.820 | how the system is supposed to work,

00:05:37.380 | how the pieces are supposed to fit together,

00:05:39.380 | what they're going to contribute, you can't build it.

00:05:42.300 | - At the functional level, understand.

00:05:44.300 | So can you actually linger on

00:05:46.140 | and describe the Blue Brain project?

00:05:48.660 | It's kind of a fascinating principle,

00:05:52.820 | an idea to try to simulate the brain.

00:05:55.860 | We're talking about the human brain, right?

00:05:57.740 | - Right, human brains and rat brains or cat brains

00:06:02.340 | have lots in common.

00:06:03.540 | That the cortex, the neocortex structure is very similar.

00:06:08.220 | So initially they were trying to just simulate a cat brain

00:06:13.780 | and--

00:06:14.860 | - To understand the nature of evil.

00:06:17.300 | - To understand the nature of evil.

00:06:19.420 | Or as it happens in most of the simulations,

00:06:22.740 | you easily get one thing out, which is oscillations.

00:06:27.700 | If you simulate a large number of neurons, they oscillate.

00:06:32.340 | And you can adjust the parameters and say that,

00:06:35.420 | oh, oscillations match the rhythm

00:06:37.980 | that we see in the brain, et cetera.

00:06:39.740 | But--

00:06:40.580 | - Oh, I see.

00:06:41.420 | So the idea is, is the simulation

00:06:44.340 | at the level of individual neurons?

00:06:46.900 | - Yeah, so the Blue Brain project,

00:06:49.260 | the original idea as proposed was,

00:06:51.980 | you put very detailed biophysical neurons,

00:06:57.260 | biophysical models of neurons.

00:07:00.460 | And you interconnect them according to the statistics

00:07:04.940 | of connections that we have found

00:07:06.420 | from real neuroscience experiments.

00:07:08.540 | And then turn it on.

00:07:10.740 | And see what happens.

00:07:13.100 | And these neural models are incredibly complicated

00:07:16.700 | in themselves, right?

00:07:17.940 | Because these neurons are modeled using this idea

00:07:22.940 | called Hodgkin-Huxley models,

00:07:24.580 | which are about how signals propagate in a cable.

00:07:28.260 | And there are active dendrites, all those phenomena,

00:07:31.860 | which those phenomena themselves,

00:07:34.060 | we don't understand that well.

00:07:36.020 | And then we put in connectivity,

00:07:38.700 | which is part guesswork, part observed.

00:07:42.580 | And of course, if we do not have any theory

00:07:44.620 | about how it is supposed to work,

00:07:46.580 | we just have to take whatever comes out of it as,

00:07:52.500 | okay, this is something interesting.

00:07:54.900 | - But in your sense, these models

00:07:56.780 | of the way signal travels along,

00:07:59.620 | like with the axons and all the basic models,

00:08:01.940 | they're too crude?

00:08:04.500 | - Oh, well, actually, they are pretty detailed

00:08:07.380 | and pretty sophisticated.

00:08:10.340 | And they do replicate the neural dynamics.

00:08:14.620 | If you take a single neuron,

00:08:16.460 | and you try to turn on the different channels,

00:08:20.980 | the calcium channels and the different receptors,

00:08:24.780 | and see what the effect of turning on

00:08:27.820 | or off those channels are in the neuron's spike output,

00:08:32.740 | people have built pretty sophisticated models of that.

00:08:35.420 | And they are, I would say, in the regime of correct.

00:08:40.420 | - Well, see, the correctness, that's interesting,

00:08:43.300 | 'cause you've mentioned it at several levels.

00:08:45.860 | The correctness is measured

00:08:46.980 | by looking at some kind of aggregate statistics?

00:08:49.580 | - It would be more the spiking dynamics of the--

00:08:53.260 | - Spiking dynamics of the single neuron, okay.

00:08:55.020 | - Yeah, and yeah, these models,

00:08:57.980 | because they are going to the level of mechanism, right?

00:09:00.780 | So they are basically looking at,

00:09:02.620 | okay, what is the effect of turning on an ion channel?

00:09:06.660 | And you can model that using electric circuits.

00:09:10.980 | And then, so they are model,

00:09:13.420 | so it is not just a function fitting,

00:09:16.460 | it is people are looking at the mechanism underlying it,

00:09:19.260 | and putting that in terms of electric circuit theory,

00:09:23.540 | signal propagation theory, and modeling that.

00:09:26.340 | And so those models are sophisticated,

00:09:29.420 | but getting a single neurons model 99% right

00:09:34.420 | does not still tell you how to,

00:09:38.460 | you know, it would be the analog

00:09:39.700 | of getting a transistor model right,

00:09:42.980 | and now trying to build a microprocessor.

00:09:46.300 | And if you just observe, you know,

00:09:49.220 | if you did not understand how a microprocessor works,

00:09:52.500 | but you say, oh, I now can model one transistor well,

00:09:56.180 | and now I will just try to interconnect the transistors

00:10:00.140 | according to whatever I could, you know,

00:10:02.100 | guess from the experiments and try to simulate it,

00:10:06.300 | then it is very unlikely

00:10:08.100 | that you will produce a functioning microprocessor.

00:10:11.500 | You want to, you know,

00:10:12.340 | when you want to produce a functioning microprocessor,

00:10:14.700 | you want to understand Boolean logic,

00:10:16.860 | how does, how do the gates work, all those things,

00:10:20.220 | and then, you know,

00:10:21.300 | understand how do those gates get implemented

00:10:23.020 | using transistors.

00:10:23.980 | - Yeah, there's actually,

00:10:25.220 | I remember this reminds me, there's a paper,

00:10:27.140 | maybe you're familiar with it,

00:10:29.020 | that I remember going through in a reading group

00:10:31.420 | that approaches a microprocessor

00:10:34.540 | from a perspective of a neuroscientist.

00:10:36.780 | I think it basically,

00:10:39.140 | it uses all the tools that we have of neuroscience

00:10:42.380 | to try to understand,

00:10:43.620 | like as if we just aliens showed up to study computers,

00:10:47.820 | - Yeah.

00:10:48.660 | - And to see if those tools can be used

00:10:50.940 | to get any kind of sense of how the microprocessor works.

00:10:54.500 | I think the final, the takeaway from,

00:10:57.860 | at least this initial exploration is that we're screwed.

00:11:02.380 | There's no way that the tools of neuroscience

00:11:04.140 | would be able to get us to anything,

00:11:06.340 | like not even Boolean logic.

00:11:07.940 | I mean, it's just,

00:11:09.700 | any aspect of the architecture of the function

00:11:16.280 | of the processes involved,

00:11:19.500 | the clocks, the timing, all that,

00:11:21.820 | you can't figure that out from the tools of neuroscience.

00:11:24.480 | - Yeah, so I'm very familiar with this particular paper.

00:11:27.700 | I think it was called,

00:11:30.340 | Can a Neuroscientist Understand a Microprocessor?

00:11:34.140 | - Yeah.

00:11:34.980 | - Something like that.

00:11:35.820 | Following the methodology in that paper,

00:11:38.580 | even an electrical engineer

00:11:40.020 | would not understand microprocessors.

00:11:41.460 | So I could, so,

00:11:42.300 | (both laughing)

00:11:44.380 | So I don't think it is that bad in the sense of saying,

00:11:49.220 | neuroscientists do find valuable things

00:11:53.100 | by observing the brain.

00:11:55.200 | They do find good insights,

00:11:58.360 | but those insights cannot be put together

00:12:01.680 | just as a simulation.

00:12:03.280 | You have to investigate

00:12:06.080 | what are the computational underpinnings

00:12:08.760 | of those findings.

00:12:10.380 | How do all of them fit together

00:12:13.040 | from an information processing perspective?

00:12:16.080 | You have to, somebody has to painstakingly

00:12:20.040 | put those things together and build hypothesis.

00:12:22.920 | So I don't want to diss all of neuroscientists saying,

00:12:25.700 | oh, they're not finding anything.

00:12:26.780 | No, that paper almost went to that level of,

00:12:29.760 | neuroscientists will never understand.

00:12:32.900 | No, that's not true.

00:12:34.220 | I think they do find lots of useful things,

00:12:36.740 | but it has to be put together in a computational framework.

00:12:39.940 | - Yeah, I mean, but just the AI systems

00:12:43.580 | will be listening to this podcast 100 years from now,

00:12:46.500 | and they will probably,

00:12:47.780 | there's some non-zero probability

00:12:50.820 | they'll find your words laughable.

00:12:52.620 | They're like, I remember humans thought

00:12:55.020 | they understood something about the brain

00:12:56.940 | and they were totally clueless.

00:12:58.200 | There's a sense about neuroscience

00:12:59.780 | that we may be in the very, very early days

00:13:02.140 | of understanding the brain.

00:13:04.260 | But I mean, that's one perspective.

00:13:07.100 | In your perspective,

00:13:10.140 | how far are we into understanding

00:13:12.900 | any aspect of the brain?

00:13:18.140 | So the dynamics of the individual neurocommunication

00:13:22.000 | to the, how in a collective sense,

00:13:26.660 | how they're able to store information,

00:13:29.100 | transfer information,

00:13:30.820 | how the intelligence then emerges,

00:13:32.580 | all that kind of stuff.

00:13:33.420 | Where are we on that timeline?

00:13:35.100 | - Yeah, so timelines are very, very hard to predict,

00:13:38.500 | and you can, of course, be wrong.

00:13:40.860 | And it can be wrong on either side.

00:13:44.140 | We know that when we look back,

00:13:47.620 | the first flight was in 1903.

00:13:51.060 | In 1900, there was a New York Times article

00:13:54.860 | on flying machines that do not fly.

00:13:57.940 | And humans might not fly for another 100 years.

00:14:02.700 | That was what that article stated.

00:14:04.780 | And so, but no, they flew three years after that.

00:14:08.380 | So it's very hard to, so-

00:14:11.540 | - Well, and on that point, one of the Wright brothers,

00:14:15.140 | I think two years before,

00:14:18.380 | said that, he said some number, like 50 years,

00:14:22.900 | he has become convinced that it's impossible.

00:14:27.900 | - Even during their experimentation, yeah, yeah, yeah.

00:14:31.860 | - I mean, that's a tribute to when,

00:14:34.140 | that's like the entrepreneurial battle of depression,

00:14:37.140 | of going through just thinking this is impossible.

00:14:40.700 | But there, yeah, there's something,

00:14:42.660 | even the person that's in it

00:14:44.380 | is not able to see, estimate correctly.

00:14:47.420 | - Exactly, but I can tell from the point of,

00:14:50.500 | objectively, what are the things that we know about the brain

00:14:53.620 | and how that can be used to build AI models,

00:14:57.140 | which can then go back and inform how the brain works.

00:15:00.820 | So my way of understanding the brain

00:15:02.820 | would be to basically say,

00:15:04.220 | look at the insights neuroscientists have found,

00:15:07.180 | understand that from a computational angle,

00:15:11.140 | information processing angle, build models using that.

00:15:15.300 | And then building that model, which functions,

00:15:19.420 | which is a functional model,

00:15:20.500 | which is doing the task that we want the model to do.

00:15:23.780 | It is not just trying to model a phenomena in the brain.

00:15:26.780 | It is trying to do what the brain is trying to do

00:15:29.220 | on the whole functional level.

00:15:32.340 | And building that model will help you

00:15:34.940 | fill in the missing pieces that,

00:15:37.460 | biology just gives you the hints.

00:15:39.340 | And building the model,

00:15:41.820 | fills in the rest of the pieces of the puzzle.

00:15:44.700 | And then you can go and connect that back to biology

00:15:47.660 | and say, okay, now it makes sense

00:15:49.580 | that this part of the brain is doing this,

00:15:53.660 | or this layer in the cortical circuit is doing this.

00:15:57.260 | And then continue this iteratively,

00:16:01.100 | because now that will inform new experiments

00:16:04.020 | in neuroscience.

00:16:05.020 | And of course, building the model

00:16:07.100 | and verifying that in the real world

00:16:08.900 | will also tell you more about,

00:16:11.780 | does the model actually work?

00:16:13.540 | And you can refine the model,

00:16:14.980 | find better ways of putting

00:16:17.260 | these neuroscience insights together.

00:16:19.580 | So I would say it is,

00:16:21.780 | so neuroscientists alone, just from experimentation,

00:16:27.180 | will not be able to build a model of the brain,

00:16:30.180 | or a functional model of the brain.

00:16:31.860 | So there's lots of efforts,

00:16:35.380 | which are very impressive efforts

00:16:36.540 | in collecting more and more connectivity data

00:16:40.340 | from the brain.

00:16:41.580 | How are the microcircuits of the brain

00:16:44.860 | connected with each other?

00:16:45.700 | - Those are beautiful, by the way.

00:16:47.220 | - Those are beautiful.

00:16:48.340 | And at the same time, those do not itself,

00:16:54.020 | by themselves, convey the story of how does it work.

00:16:57.820 | And somebody has to understand,

00:16:59.900 | okay, why are they connected like that?

00:17:01.740 | And what are those things doing?

00:17:04.580 | And we do that by building models in AI,

00:17:08.100 | using hints from neuroscience, and repeat the cycle.

00:17:11.980 | - So what aspect of the brain

00:17:15.820 | are useful in this whole endeavor?

00:17:17.700 | Which, by the way, I should say,

00:17:19.300 | you're both a neuroscientist and an AI person.

00:17:23.140 | I guess the dream is to both understand the brain

00:17:25.460 | and to build AGI systems.

00:17:27.460 | So you're, it's like an engineer's perspective

00:17:31.860 | of trying to understand the brain.

00:17:33.860 | So what aspects of the brain, functionally speaking,

00:17:37.220 | like you said, do you find interesting?

00:17:38.820 | - Yeah, quite a lot of things.

00:17:40.460 | So one is, if you look at the visual cortex,

00:17:45.180 | and the visual cortex is a large part of the brain.

00:17:50.860 | I forgot the exact fraction,

00:17:52.420 | but it's a huge part of our brain area

00:17:55.580 | is occupied by just vision.

00:17:59.100 | So vision, visual cortex is not just

00:18:01.660 | a feed-forward cascade of neurons.

00:18:05.820 | There are a lot more feedback connections in the brain

00:18:08.340 | compared to the feed-forward connections.

00:18:10.020 | And it is surprising to the level of detail

00:18:14.340 | neuroscientists have actually studied this.

00:18:16.380 | If you go into neuroscience literature

00:18:18.500 | and poke around and ask, have they studied

00:18:21.860 | what will be the effect of poking a neuron

00:18:24.740 | in level IT in level V1?

00:18:29.100 | And have they studied that?

00:18:32.500 | And you will say, yes, they have studied that.

00:18:35.380 | - So every possible combination has been studied.

00:18:38.820 | - I mean, it's not a random exploration at all.

00:18:41.620 | It's very hypothesis-driven, right?

00:18:44.140 | They are very, experimental neuroscientists

00:18:46.420 | are very, very systematic in how they probe the brain.

00:18:49.340 | Because experiments are very costly to conduct.

00:18:52.100 | They take a lot of preparation.

00:18:53.580 | They need a lot of control.

00:18:55.180 | So they are very hypothesis-driven

00:18:57.340 | in how they probe the brain.

00:18:58.580 | And often what I find is that when we have a question

00:19:03.900 | in AI about, has anybody probed

00:19:07.780 | how lateral connections in the brain works?

00:19:10.620 | And when you go and read the literature,

00:19:11.980 | yes, people have probed it,

00:19:13.140 | and people have probed it very systematically.

00:19:15.260 | And they have hypothesis about how those lateral connections

00:19:20.100 | are supposedly contributing to visual processing.

00:19:24.540 | But of course, they haven't built

00:19:26.220 | very, very functional detailed models of it.

00:19:28.820 | - By the way, how do they, in those studies,

00:19:30.460 | sorry to interrupt, do they stimulate like a neuron

00:19:33.780 | in one particular area of the visual cortex

00:19:36.540 | and then see how the signal travels kind of thing?

00:19:39.740 | - Fascinating, very, very fascinating experiments.

00:19:41.940 | So I can give you one example I was impressed with.

00:19:44.660 | This is, so before going to that,

00:19:46.900 | let me give you an overview

00:19:50.780 | of how the layers in the cortex are organized.

00:19:53.580 | Visual cortex is organized

00:19:56.060 | into roughly four hierarchical levels.

00:19:58.460 | Okay, so V1, V2, V4, IT.

00:20:01.700 | And in V1-

00:20:03.180 | - What happened to V3?

00:20:04.700 | - Well, yeah, there's another pathway.

00:20:06.780 | Okay, so there is, this is,

00:20:08.260 | I'm talking about just object recognition pathway.

00:20:10.580 | - All right, cool.

00:20:11.740 | - And then in V1 itself,

00:20:14.780 | so it's, there is a very detailed microcircuit in V1 itself.

00:20:19.420 | There is organization within a level itself.

00:20:22.580 | The cortical sheet is organized into multiple layers,

00:20:26.980 | and there are columnar structure.

00:20:28.860 | And this layer-wise and columnar structure

00:20:32.580 | is repeated in V1, V2, V4, IT, all of them, right?

00:20:37.580 | And the connections between these layers within a level,

00:20:41.500 | in V1 itself, there are six layers, roughly,

00:20:44.980 | and the connections between them,

00:20:46.500 | there is a particular structure to them.

00:20:48.700 | And now, so one example of an experiment people did is,

00:20:53.700 | when you present a stimulus, which is,

00:21:00.180 | let's say, requires separating the foreground

00:21:03.940 | from the background of an object.

00:21:05.660 | So it's a textured triangle on a textured background.

00:21:10.660 | And you can check, does the surface settle first,

00:21:15.700 | or does the contour settle first?

00:21:17.540 | - Settle?

00:21:19.940 | - Settle in the sense that the,

00:21:21.780 | so when you finally form the percept of the triangle,

00:21:28.220 | you understand where the contours of the triangle are,

00:21:31.140 | and you also know where the inside of the triangle is,

00:21:33.980 | right, that's when you form the final percept.

00:21:36.500 | Now, you can ask, what is the dynamics

00:21:40.060 | of forming that final percept?

00:21:41.860 | Do the neurons first find the edges

00:21:48.060 | and converge on where the edges are,

00:21:51.780 | and then they find the inner surfaces,

00:21:54.740 | or does it go the other way around?

00:21:55.940 | - The other way around.

00:21:56.900 | So what's the answer?

00:21:58.460 | - In this case, it turns out that it first settles

00:22:02.260 | on the edges, it converges on the edge hypothesis first,

00:22:06.100 | and then the surfaces are filled in

00:22:09.100 | from the edges to the inside.

00:22:11.140 | - That's fascinating.

00:22:12.180 | - And the detail to which you can study this,

00:22:15.900 | it's amazing that you can actually not only find

00:22:20.020 | the temporal dynamics of when this happens,

00:22:23.300 | and then you can also find which layer in V1,

00:22:27.300 | which layer is encoding the edges,

00:22:30.580 | which layer is encoding the surfaces,

00:22:33.100 | and which layer is encoding the feedback,

00:22:36.060 | which layer is encoding the feedforward,

00:22:37.620 | and what's the combination of them

00:22:39.620 | that produces the final percept.

00:22:41.220 | And these kinds of experiments stand out

00:22:44.580 | when you try to explain illusions.

00:22:47.100 | One example of a favorite illusion of mine

00:22:50.140 | is the Kanitsa triangle,

00:22:51.220 | I don't know that you are familiar with this one.

00:22:53.260 | So this is an example where it's a triangle,

00:22:58.220 | but only the corners of the triangle

00:23:01.500 | are shown in the stimulus.

00:23:04.100 | So they look like kind of Pac-Man.

00:23:06.300 | - Oh, the black Pac-Man, yeah.

00:23:08.780 | - And then your visual system hallucinates the edges.

00:23:13.020 | And when you look at it, you will see a faint edge.

00:23:17.660 | And you can go inside the brain and look,

00:23:21.820 | do actually neurons signal the presence of this edge?

00:23:25.700 | And if they signal, how do they do it?

00:23:28.580 | Because they are not receiving anything from the input.

00:23:32.100 | The input is black for those neurons, right?

00:23:35.340 | So how do they signal it?

00:23:37.540 | When does the signaling happen?

00:23:39.180 | So if a real contour is present in the input,

00:23:43.460 | then the neurons immediately signal,

00:23:46.780 | oh, okay, there is an edge here.

00:23:49.220 | When it is an illusory edge,

00:23:51.940 | it is clearly not in the input,

00:23:53.580 | it is coming from the context.

00:23:55.620 | So those neurons fire later.

00:23:57.980 | And you can say that, okay,

00:24:00.020 | it's the feedback connections that is causing them to fire.

00:24:03.460 | And they happen later,

00:24:06.100 | and you can find the dynamics of them.

00:24:08.980 | So these studies are pretty impressive and very detailed.

00:24:13.380 | - So by the way, just a step back,

00:24:16.940 | you said that there may be more feedback connections

00:24:20.220 | than feedforward connections.

00:24:21.740 | First of all, just for like a machine learning folks,

00:24:26.740 | I mean, that's crazy

00:24:29.380 | that there's all these feedback connections.

00:24:32.180 | We often think about,

00:24:34.220 | thanks to deep learning,

00:24:38.460 | you start to think about the human brain

00:24:41.780 | as a kind of feedforward mechanism.

00:24:44.780 | So what the heck are these feedback connections?

00:24:48.660 | - Yeah.

00:24:49.500 | - What's the dynamics?

00:24:52.260 | What are we supposed to think about them?

00:24:54.060 | - Yeah, so this fits into a very beautiful picture

00:24:57.180 | about how the brain works, right?

00:24:59.380 | So the beautiful picture of how the brain works

00:25:02.340 | is that our brain is building a model of the world.

00:25:06.420 | I know, so our visual system is building a model

00:25:10.380 | of how objects behave in the world.

00:25:12.860 | And we are constantly projecting

00:25:15.260 | that model back onto the world.

00:25:17.180 | So what we are seeing is not just a feedforward thing

00:25:21.300 | that just gets interpreted in a feedforward part.

00:25:23.860 | We are constantly projecting our expectations

00:25:26.180 | onto the world.

00:25:27.220 | And what the final percept is a combination

00:25:30.300 | of what we project onto the world,

00:25:32.620 | combined with what the actual sensory input is.

00:25:36.860 | - Almost like trying to calculate the difference

00:25:39.140 | and then trying to interpret the difference.

00:25:40.980 | - Yeah, I wouldn't put it as calculating the difference.

00:25:44.580 | It's more like what is the best explanation

00:25:46.860 | for the input stimulus based on the model

00:25:51.140 | of the world I have?

00:25:52.460 | - Got it, got it.

00:25:53.900 | And that's where all the illusions come in.

00:25:56.060 | But that's an incredibly efficient process.

00:26:00.060 | So the feedback mechanism, it just helps you constantly,

00:26:04.780 | yeah, to hallucinate how the world should be

00:26:07.460 | based on your world model.

00:26:08.540 | And then just looking at if there's novelty,

00:26:13.540 | like trying to explain it.

00:26:16.460 | Hence, that's why movement,

00:26:18.820 | we detect movement really well.

00:26:20.380 | There's all these kinds of things.

00:26:21.740 | And this is like at all different levels

00:26:24.740 | of the cortex you're saying.

00:26:26.860 | This happens at the lowest level, at the highest level.

00:26:29.300 | - Yes, yeah.

00:26:30.260 | In fact, feedback connections are more prevalent

00:26:32.340 | in everywhere in the cortex.

00:26:33.820 | And so one way to think about it,

00:26:37.340 | and there's a lot of evidence for this,

00:26:38.700 | is inference.

00:26:41.180 | So basically, if you have a model of the world,

00:26:44.180 | and when some evidence comes in,

00:26:47.540 | what you are doing is inference.

00:26:49.700 | You are trying to now explain this evidence

00:26:53.420 | using your model of the world.

00:26:55.460 | And this inference includes projecting your model

00:26:59.060 | onto the evidence and taking the evidence

00:27:02.620 | back into the model and doing an iterative procedure.

00:27:06.780 | And this iterative procedure is what happens

00:27:10.340 | using the feedforward feedback propagation.

00:27:13.180 | And feedback affects what you see in the world,

00:27:16.340 | and it also affects feedforward propagation.

00:27:18.780 | And examples are everywhere.

00:27:21.220 | We see these kinds of things everywhere.

00:27:24.660 | The idea that there can be multiple competing hypothesis

00:27:28.460 | in our model, trying to explain the same evidence,

00:27:32.580 | and then you have to kind of make them compete.

00:27:35.940 | And one hypothesis will explain away the other hypothesis

00:27:40.660 | through this competition process.

00:27:42.260 | - Wait, what?

00:27:43.100 | So you have competing models of the world

00:27:47.020 | that try to explain, what do you mean by explain away?

00:27:50.180 | - So this is a classic example in graphical models,

00:27:54.100 | probabilistic models.

00:27:55.620 | So if you-- - Oh, what are those?

00:27:58.620 | - Okay. (laughs)

00:28:01.100 | - I think it's useful to mention

00:28:02.500 | because we'll talk about them more.

00:28:04.260 | - Yeah, yeah.

00:28:05.180 | So neural networks are one class

00:28:08.580 | of machine learning models.

00:28:10.900 | You have distributed set of nodes,

00:28:13.820 | which are called the neurons.

00:28:15.780 | Each one is doing a dot product

00:28:17.260 | and you can approximate any function

00:28:19.060 | using this multilevel network of neurons.

00:28:22.460 | So that's a class of models

00:28:24.780 | which are useful for function approximation.

00:28:27.500 | There is another class of models in machine learning

00:28:30.740 | called probabilistic graphical models.

00:28:33.140 | And you can think of them as each node in that model

00:28:38.140 | is variable, which is talking about something.

00:28:41.980 | It can be a variable representing,

00:28:44.540 | is an edge present in the input or not?

00:28:48.100 | And at the top of the network,

00:28:52.020 | a node can be representing,

00:28:55.260 | is there an object present in the world or not?

00:28:58.380 | And then, so it is another way of encoding knowledge.

00:29:03.380 | And then once you encode the knowledge,

00:29:08.260 | you can do inference in the right way.

00:29:12.500 | What is the best way to explain some set of evidence

00:29:17.500 | using this model that you encoded?

00:29:19.660 | So when you encode the model,

00:29:21.020 | you are encoding the relationship

00:29:22.780 | between these different variables.

00:29:23.860 | How is the edge connected to the model of the object?

00:29:27.660 | How is the surface connected to the model of the object?

00:29:30.060 | And then, of course,

00:29:32.700 | this is a very distributed, complicated model.

00:29:34.900 | And inference is, how do you explain a piece of evidence

00:29:39.900 | when a set of stimulus comes in?

00:29:41.420 | If somebody tells me there is a 50% probability

00:29:44.460 | that there is an edge here in this part of the model,

00:29:46.900 | how does that affect my belief on whether I should think

00:29:51.180 | that there should be a square present in the image?

00:29:54.460 | So this is the process of inference.

00:29:56.940 | So one example of inference is having this

00:30:00.500 | expanding effect between multiple causes.

00:30:03.180 | So graphical models can be used

00:30:06.140 | to represent causality in the world.

00:30:08.940 | So let's say, you know, your alarm at home

00:30:13.940 | can be triggered by a burglar getting into your house,

00:30:21.220 | or it can be triggered by an earthquake.

00:30:24.860 | Both can be causes of the alarm going off.

00:30:27.900 | So now, you're in your office,

00:30:31.380 | you heard burglar alarm going off,

00:30:33.380 | you are heading home,

00:30:35.380 | thinking that there's a burglar, got it.

00:30:37.500 | But while driving home, if you hear on the radio

00:30:39.860 | that there was an earthquake in the vicinity,

00:30:42.460 | now your strength of evidence

00:30:45.980 | for a burglar getting into their house is diminished.

00:30:50.180 | Because now that piece of evidence is explained

00:30:53.380 | by the earthquake being present.

00:30:55.660 | So if you think about these two causes

00:30:57.700 | explaining at lower level variable, which is alarm,

00:31:00.940 | now what we're seeing is that

00:31:03.340 | increasing the evidence for some cause,

00:31:05.580 | there is evidence coming in from below

00:31:08.660 | for alarm being present.

00:31:10.380 | And initially it was flowing to a burglar being present,

00:31:14.300 | but now since there is side evidence for this other cause,

00:31:19.300 | it explains away this evidence

00:31:21.140 | and evidence will now flow to the other cause.

00:31:23.260 | This is two competing causal things

00:31:26.580 | trying to explain the same evidence.

00:31:28.380 | - And the brain has a similar kind of mechanism

00:31:31.460 | for doing so.

00:31:32.460 | That's kind of interesting.

00:31:33.820 | How's that all encoded in the brain?

00:31:38.460 | Like where's the storage of information?

00:31:40.380 | Are we talking, just maybe to get it

00:31:43.740 | a little bit more specific,

00:31:45.620 | is it in the hardware of the actual connections?

00:31:48.220 | Is it in chemical communication?

00:31:51.020 | Is it electrical communication?

00:31:53.180 | Do we know?

00:31:54.340 | - So this is a paper that we are bringing out soon.

00:31:57.900 | - Which one is this?

00:31:59.220 | - This is the cortical microcircuits paper

00:32:01.300 | that I sent you a draft of.

00:32:03.220 | Of course, a lot of it is still hypothesis.

00:32:06.100 | One hypothesis is that you can think of a cortical column

00:32:09.980 | as encoding a concept.

00:32:13.060 | A concept, think of it as an example of a concept

00:32:18.060 | is an edge present or not,

00:32:22.180 | or is an object present or not.

00:32:25.180 | Okay, so you can think of it as a binary variable,

00:32:27.420 | a binary random variable,

00:32:28.860 | the presence of an edge or not,

00:32:30.460 | or the presence of an object or not.

00:32:32.140 | So each cortical column can be thought of

00:32:34.380 | as representing that one concept, one variable.

00:32:38.140 | And then the connections between these cortical columns

00:32:41.420 | are basically encoding the relationship

00:32:43.820 | between these random variables.

00:32:45.700 | And then there are connections within the cortical column.

00:32:49.460 | Each cortical column is implemented

00:32:51.180 | using multiple layers of neurons

00:32:53.180 | with very, very, very rich structure there.

00:32:57.780 | There are thousands of neurons in a cortical column.

00:33:00.700 | - But that structure is similar

00:33:02.140 | across the different cortical columns.

00:33:03.620 | - Correct, correct.

00:33:04.940 | And also these cortical columns

00:33:06.820 | connect to a substructure called thalamus.

00:33:09.220 | So all cortical columns pass through this substructure.

00:33:14.220 | So our hypothesis is that

00:33:17.300 | the connections between the cortical columns

00:33:18.980 | implement this, that's where the knowledge is stored

00:33:22.900 | about how these different concepts connect to each other.

00:33:27.300 | And then the neurons inside this cortical column

00:33:30.860 | and in thalamus in combination

00:33:33.060 | implement this actual computations in data for inference,

00:33:38.060 | which includes explaining away

00:33:39.980 | and competing between the different hypothesis.

00:33:43.740 | And it is all very,

00:33:46.100 | so what is amazing is that

00:33:47.540 | neuroscientists have actually done experiments

00:33:51.940 | to the tune of showing these things.

00:33:54.460 | They might not be putting it

00:33:55.620 | in the overall inference framework,

00:33:58.620 | but they will show things like,

00:34:00.460 | if I poke this higher level neuron,

00:34:03.260 | it will inhibit through this complicated loop

00:34:06.020 | through the thalamus,

00:34:06.860 | it will inhibit this other column.

00:34:08.580 | So they will do such experiments.

00:34:11.940 | - Do they use terminology of concepts, for example?

00:34:14.780 | So, I mean,

00:34:16.980 | is it something where,

00:34:21.420 | it's easy to anthropomorphize and think about concepts,

00:34:25.140 | like you started moving into logic-based

00:34:28.260 | kind of reasoning systems.

00:34:29.860 | So how would you think of concepts in that kind of way?

00:34:34.140 | Or is it a lot messier, a lot more gray area,

00:34:39.140 | you know, even more gray,

00:34:43.420 | even more messy than the artificial neural network kinds,

00:34:47.620 | kinds of abstractions?

00:34:48.460 | - Easiest way to think of it is a variable, right?

00:34:50.820 | It's a binary variable,

00:34:52.140 | which is showing the presence or absence of something.

00:34:55.900 | - But I guess what I'm asking is,

00:34:58.100 | is that something that we're supposed to think of something

00:35:01.900 | that's human interpretable, of that something?

00:35:04.180 | - It doesn't need to be.

00:35:05.500 | It doesn't need to be human interpretable.

00:35:07.100 | There's no need for it to be human interpretable.

00:35:09.740 | But it's almost like,

00:35:12.700 | you will be able to find some interpretation of it

00:35:17.700 | because it is connected to the other things

00:35:21.220 | that you know about.

00:35:22.100 | - And the point is it's useful somehow.

00:35:25.460 | It's useful as an entity in the graphic,

00:35:30.460 | in connecting to the other entities that are,

00:35:33.020 | let's call them concepts.

00:35:34.740 | Okay, so by the way, are these the cortical microcircuits?

00:35:40.020 | - Correct, these are the cortical microcircuits.

00:35:42.300 | That's what neuroscientists use to talk about the circuits

00:35:45.780 | within a level of the cortex.

00:35:49.140 | So you can think of, let's think of a neural network,

00:35:53.060 | artificial neural network terms.

00:35:54.820 | People talk about the architecture of the,

00:35:56.660 | so how many layers they build,

00:35:59.860 | what is the fan in, fan out, et cetera.

00:36:01.820 | That is the macro architecture.

00:36:03.380 | And then within a layer of the neural network,

00:36:08.500 | you can, the cortical neural network

00:36:11.860 | is much more structured within a level.

00:36:14.620 | There's a lot more intricate structure there.

00:36:17.300 | But even within an artificial neural network,

00:36:20.380 | you can think of feature detection plus pooling as one level.

00:36:24.900 | And so that is kind of a microcircuit.

00:36:27.500 | It's much more complex in the real brain.

00:36:30.980 | So within a level, whatever is that circuitry

00:36:35.500 | within a column of the cortex

00:36:37.660 | and between the layers of the cortex,

00:36:39.340 | that's the microcircuitry.

00:36:41.060 | - I love that terminology.

00:36:42.660 | Machine learning people don't use the circuit terminology,

00:36:45.940 | but they should.

00:36:46.980 | It's nice.

00:36:47.940 | So, okay.

00:36:48.780 | Okay, so that's the cortical microcircuits.

00:36:53.980 | So what's interesting about, what can we say,

00:36:57.300 | what is the paper that you're working on,

00:37:00.820 | propose about the ideas around these cortical microcircuits?

00:37:04.420 | - So this is a fully functional model

00:37:08.060 | for the microcircuits of the visual cortex.

00:37:10.780 | - So the paper focuses, and your idea in our discussions now

00:37:13.700 | is focusing on vision.

00:37:15.660 | - Yeah.

00:37:16.500 | - The visual cortex.

00:37:18.420 | Okay.

00:37:19.260 | So this is a model, this is a full model.

00:37:20.540 | This is how vision works.

00:37:23.020 | - Well, this is a model of--

00:37:24.820 | - A hypothesis.

00:37:25.660 | - Yeah.

00:37:26.500 | - A hypothesis.

00:37:27.340 | - Okay, so let me step back a bit.

00:37:29.500 | So we looked at neuroscience for insights

00:37:32.660 | on how to build a vision model.

00:37:35.020 | - Right.

00:37:35.860 | - And we synthesized all those insights

00:37:38.300 | into a computational model.

00:37:39.780 | This is called the recursive cortical network model

00:37:41.980 | that we used for breaking CAPTCHAs.

00:37:45.140 | And we are using the same model for robotic picking

00:37:48.860 | and tracking of objects.

00:37:50.860 | - And that again is a vision system.

00:37:52.500 | - That's a vision system.

00:37:53.340 | - Computer vision system.

00:37:54.540 | - That's a computer vision system.

00:37:55.500 | - Takes in images and outputs what?

00:37:59.260 | - On one side, it outputs the class of the image

00:38:02.340 | and also segments the image.

00:38:05.620 | And you can also ask it further queries.

00:38:07.420 | Where is the edge of the object?

00:38:08.860 | Where is the interior of the object?

00:38:10.500 | So it's a model that you build to answer multiple questions.

00:38:15.100 | So you're not trying to build a model

00:38:16.580 | for just classification or just segmentation, et cetera.

00:38:19.540 | It's a joint model that can do multiple things.

00:38:23.820 | And so that's the model that we built

00:38:27.980 | using insights from neuroscience.

00:38:30.380 | And some of those insights are,

00:38:32.300 | what is the role of feedback connections?

00:38:34.220 | What is the role of lateral connections?

00:38:36.060 | So all those things went into the model.

00:38:37.940 | The model actually uses feedback connections.

00:38:40.860 | - All these ideas from neuroscience.

00:38:43.020 | - Yeah.

00:38:43.860 | - So what the heck is a recursive cortical network?

00:38:46.420 | Like what are the architecture approaches,

00:38:49.180 | interesting aspects here,

00:38:50.740 | which is essentially a brain inspired approach

00:38:54.900 | to a computer vision?

00:38:56.500 | - Yeah.

00:38:57.340 | So there are multiple layers to this question.

00:38:59.420 | I can go from the very, very top and then zoom in.

00:39:03.420 | So one important thing, constraint that went into the model

00:39:07.060 | is that you should not think vision,

00:39:10.060 | think of vision as something in isolation.

00:39:12.900 | We should not think perception as something

00:39:16.060 | as a pre-processor for cognition.

00:39:18.100 | Perception and cognition are interconnected.

00:39:21.700 | And so you should not think of one problem

00:39:24.140 | in separation from the other problem.

00:39:26.220 | And so that means if you finally want to have a system

00:39:29.860 | that understand concepts about the world

00:39:32.460 | and can learn in a very conceptual model of the world

00:39:35.660 | and can reason and connect to language,

00:39:37.700 | all of those things,

00:39:38.820 | you need to have, think all the way through

00:39:41.660 | and make sure that your perception system

00:39:44.140 | is compatible with your cognition system

00:39:46.340 | and language system and all of them.

00:39:48.220 | And one aspect of that is top-down controllability.

00:39:51.740 | - What does that mean?

00:39:54.460 | - So that means,

00:39:55.300 | - In this context.

00:39:56.140 | - So think of, you can close your eyes

00:39:59.340 | and think about the details of one object.

00:40:02.220 | I can zoom in further and further.

00:40:05.540 | So think of the bottle in front of me.

00:40:08.220 | And now you can think about,

00:40:11.460 | okay, what the cap of that bottle looks.

00:40:14.220 | I know we can think about what's the texture

00:40:16.060 | on that bottle of the cap.

00:40:19.340 | You can think about what will happen if something hits that.

00:40:24.660 | So you can manipulate your visual knowledge

00:40:29.180 | in cognition driven ways.

00:40:32.060 | - Yes.

00:40:32.900 | - And so this top-down controllability

00:40:36.380 | and being able to simulate scenarios in the world.

00:40:40.580 | - So you're not just a passive player

00:40:44.340 | in this perception game.

00:40:45.820 | You can control it.

00:40:47.420 | You have imagination.

00:40:49.500 | - Correct, correct.

00:40:50.620 | So basically, having a generative network,

00:40:54.900 | which is a model,

00:40:55.940 | and it is not just some arbitrary generative network.

00:40:58.820 | It has to be built in a way that it is controllable top-down.

00:41:03.020 | It is not just trying to generate a whole picture at once.

00:41:06.780 | It's not trying to generate

00:41:09.060 | photorealistic things of the world.

00:41:10.500 | You don't have good photorealistic models of the world.

00:41:13.300 | Human brains do not have.

00:41:14.300 | If I, for example, ask you the question,

00:41:17.100 | what is the color of the letter E in the Google logo?

00:41:20.660 | You have no idea.

00:41:23.300 | - No idea.

00:41:24.140 | - You probably have seen it millions of times.

00:41:25.820 | (laughing)

00:41:26.660 | Or not millions of times, hundreds of times.

00:41:28.060 | (laughing)

00:41:28.900 | So it's not, our model is not photorealistic.

00:41:31.300 | But it has other properties that we can manipulate it.

00:41:35.100 | And you can think about filling in

00:41:37.940 | a different color in that logo.

00:41:39.460 | You can think about expanding the letter E.

00:41:42.100 | So you can imagine the consequence of actions

00:41:47.620 | that you have never performed.

00:41:48.780 | So these are the kind of characteristics

00:41:51.100 | the generative model need to have.

00:41:52.740 | So this is one constraint that went into our model.

00:41:55.140 | So this is, when you read the,

00:41:57.460 | just the perception side of the paper,

00:41:59.180 | it is not obvious that this was a constraint

00:42:01.380 | into the, that went into the model,

00:42:03.220 | this top-down controllability of the generative model.

00:42:06.500 | - So what does top-down controllability in a model look like?

00:42:11.500 | It's a really interesting concept, fascinating concept.

00:42:15.140 | What is that?

00:42:15.980 | Is that the recursiveness gives you that?

00:42:18.700 | Or how do you do it?

00:42:21.220 | - Quite a few things.

00:42:22.060 | It's like, what does the model factor,

00:42:24.860 | factorize, what are the,

00:42:26.620 | what is the model representing as different pieces

00:42:29.100 | in the puzzle?

00:42:29.940 | So in the RCN network, it thinks of the world,

00:42:34.940 | so what I say, the background of an image

00:42:38.420 | is modeled separately from the foreground of the image.

00:42:41.700 | So the objects are separate from the background.

00:42:44.220 | They are different entities.

00:42:45.380 | - So there's a kind of segmentation

00:42:46.980 | that's built in fundamentally to the structure.

00:42:49.260 | And then even that object is composed of parts.

00:42:53.020 | And also, another one is the shape of the object

00:42:56.700 | is differently modeled from the texture of the object.

00:43:00.900 | - Got it.

00:43:02.900 | So there's like these, I've been,

00:43:06.020 | you know who Francois Chollet is?

00:43:08.540 | - Yeah, yeah.

00:43:09.380 | - He's, so there's, he developed this IQ test

00:43:13.460 | type of thing for ARC challenge for,

00:43:16.460 | and it's kind of cool that there's these concepts,

00:43:20.620 | priors that he defines that you bring to the table

00:43:24.020 | in order to be able to reason about basic shapes

00:43:26.540 | and things in an IQ test.

00:43:28.660 | So here you're making it quite explicit

00:43:31.580 | that here are the things that you should be,

00:43:34.900 | these are like distinct things

00:43:36.700 | that you should be able to model in this.

00:43:40.140 | - Keep in mind that you can derive this

00:43:42.820 | from much more general principles.

00:43:44.500 | It doesn't, you don't need to explicitly put it as,

00:43:47.260 | oh, objects versus foreground versus background,

00:43:50.860 | the surface versus texture.

00:43:52.220 | No, these are derivable from more fundamental principles

00:43:56.100 | of how, you know, what's the property

00:43:59.300 | of continuity of natural signals?

00:44:01.060 | - What's the property of continuity of natural signals?

00:44:05.500 | - Yeah.

00:44:06.340 | - By the way, that sounds very poetic, but yeah.

00:44:08.860 | So you're saying that's a,

00:44:10.540 | there's some low-level properties

00:44:12.740 | from which emerges the idea that shapes

00:44:14.700 | should be different than,

00:44:16.540 | like there should be a parts of an object,

00:44:18.620 | there should be, I mean,

00:44:19.980 | - Exactly.

00:44:20.820 | - Kind of like Francois, I mean, there's objectness,

00:44:23.500 | there's all these things that it's kind of crazy

00:44:25.980 | that we humans, I guess, evolved to have

00:44:29.620 | because it's useful for us to perceive the world.

00:44:32.020 | - Correct, correct.

00:44:32.860 | And it derives mostly from the properties

00:44:35.300 | of natural signals.

00:44:36.380 | And so-

00:44:39.020 | - Natural signals.

00:44:40.460 | So natural signals are the kind of things

00:44:42.540 | we'll perceive in the natural world.

00:44:44.740 | - Correct.

00:44:45.580 | - I don't know why that sounds so beautiful,

00:44:47.540 | natural signals, yeah.

00:44:48.700 | - As opposed to a QR code, right?

00:44:50.660 | Which is an artificial signal that we created.

00:44:52.780 | Humans are not very good at classifying QR codes.

00:44:55.620 | We are very good at saying something is a cat or a dog,

00:44:58.460 | but not very good at, you know,

00:45:00.100 | where computers are very good at classifying QR codes.

00:45:03.900 | So our visual system is tuned for natural signals.

00:45:08.460 | And there are fundamental assumptions in the architecture

00:45:11.340 | that are derived from natural signals properties.

00:45:15.140 | - I wonder when you take a hallucinogenic drugs,

00:45:18.340 | does that go into natural or is that closer to QR code?

00:45:22.620 | - It's still natural.

00:45:23.900 | - It's still natural?

00:45:24.780 | - Yeah, because it is still operating using our brains.

00:45:28.060 | - By the way, on that topic, I mean,

00:45:30.140 | I haven't been following,

00:45:31.340 | I think they're becoming legalized in certain,

00:45:33.260 | I can't wait until they become legalized to a degree

00:45:36.980 | that you, like vision science researchers could study it.

00:45:40.100 | - Yeah.

00:45:40.940 | - And then through medical, chemical ways, modify it.

00:45:45.860 | There could be ethical concerns,

00:45:47.060 | but that's another way to study the brain

00:45:49.820 | is to be able to chemically modify it.

00:45:53.180 | There's probably very long a way

00:45:56.980 | to figure out how to do it ethically.

00:45:59.180 | - Yeah, but I think there are studies on that already.

00:46:02.500 | - Already?

00:46:03.340 | - Yeah, I think so.

00:46:04.260 | Because it's not unethical to give it to rats.

00:46:08.100 | - Oh, that's true, that's true.

00:46:09.500 | (both laughing)

00:46:12.180 | There's a lot of drugged up rats out there.

00:46:14.420 | Okay, cool, sorry, sorry to, so okay.

00:46:16.540 | So there's these low level things from natural signals

00:46:21.540 | that can--

00:46:26.580 | - From which these properties will emerge.

00:46:28.660 | - Yes.

00:46:29.500 | - But it is still a very hard problem

00:46:32.660 | on how to encode that.

00:46:34.580 | So you don't, there is no,

00:46:36.580 | so you mentioned the priors Franscho wanted to encode

00:46:41.580 | in the abstract reasoning challenge,

00:46:45.020 | but it is not straightforward how to encode those priors.

00:46:48.780 | So some of those challenges,

00:46:51.140 | like the object recognition, completion challenges

00:46:53.860 | are things that we purely use our visual system to do.

00:46:57.260 | It looks like abstract reasoning,

00:46:59.540 | but it is purely an output of the vision system.

00:47:02.460 | For example, completing the corners

00:47:04.340 | of that Kaninsa triangle,

00:47:05.460 | completing the lines of that Kaninsa triangle.

00:47:07.340 | It's a purely a visual system property.

00:47:09.540 | There is no abstract reasoning involved.

00:47:11.220 | It uses all these priors,

00:47:13.140 | but it is stored in our visual system in a particular way

00:47:17.140 | that is amenable to inference.

00:47:19.180 | And that is one of the things that we tackled in the,

00:47:24.180 | basically saying, okay, these are the prior knowledge

00:47:27.180 | which will be derived from the world,

00:47:29.500 | but then how is that prior knowledge represented

00:47:32.580 | in the model such that inference,

00:47:35.380 | when some piece of evidence comes in,

00:47:37.340 | can be done very efficiently and in a very distributed way.

00:47:40.980 | Because it is very,

00:47:43.300 | there are so many ways of representing knowledge,

00:47:45.500 | which is not amenable to very quick inference,

00:47:49.140 | you know, quick lookups.

00:47:50.820 | And so that's one core part of what we tackled

00:47:55.260 | in the RCN model.

00:47:57.460 | How do you encode visual knowledge

00:47:59.820 | to do very quick inference?

00:48:02.180 | And yeah.

00:48:03.020 | - Can you maybe comment on,

00:48:04.540 | so folks listening to this in general

00:48:07.100 | may be familiar with different kinds of architectures

00:48:09.980 | of neural networks.

00:48:11.140 | What are we talking about with the RCN?

00:48:14.620 | What does the architecture look like?

00:48:17.260 | What are the different components?

00:48:18.860 | Is it close to neural networks?

00:48:20.140 | Is it far away from neural networks?

00:48:21.940 | What does it look like?

00:48:22.780 | - Yeah, so you can think of the delta between the model

00:48:27.140 | and a convolutional neural network,

00:48:28.700 | if people are familiar with convolutional neural networks.

00:48:31.580 | So convolutional neural networks

00:48:32.820 | have this feed-forward processing cascade,

00:48:35.260 | which is called feature detectors and pooling.

00:48:38.620 | And that is repeated in the hierarchy,

00:48:40.620 | in a multi-level system.

00:48:43.660 | And if you want an intuitive idea of what is happening,

00:48:47.820 | feature detectors are, you know,

00:48:50.460 | detecting interesting co-occurrences in the input.

00:48:54.340 | It can be a line, a corner, an eye,

00:48:58.460 | or a piece of texture, et cetera.

00:49:00.860 | And the pooling neurons are doing

00:49:04.180 | some local transformation of that

00:49:06.340 | and making it invariant to local transformations.

00:49:08.820 | So this is what the structure

00:49:09.780 | of convolutional neural network is.

00:49:11.580 | Recursive cortical network has a similar structure

00:49:16.540 | when you look at just the feed-forward pathway.

00:49:18.900 | But in addition to that,

00:49:20.020 | it is also structured in a way that it is generative.

00:49:22.660 | So that it can run it backward

00:49:25.660 | and combine the forward with the backward.

00:49:28.500 | Another aspect that it has is it has lateral connections.

00:49:33.340 | These lateral connections, which is between,

00:49:37.660 | so if you have an edge here and an edge here,

00:49:40.220 | it has connections between these edges.

00:49:42.060 | It is not just feed-forward connections.

00:49:43.780 | It is something between these edges,

00:49:47.380 | which is the nodes representing these edges,

00:49:50.300 | which is to enforce compatibility between them.

00:49:53.020 | So otherwise what will happen is that-

00:49:54.220 | - Like constraints?

00:49:55.220 | - It's a constraint.

00:49:56.060 | It's basically, if you do just feature detection

00:49:59.620 | followed by pooling,

00:50:01.300 | then your transformations in different parts

00:50:04.460 | of the visual field are not coordinated.

00:50:06.500 | And so you will create jagged,

00:50:10.620 | when you generate from the model,

00:50:12.060 | you will create jagged things

00:50:14.820 | and uncoordinated transformations.

00:50:17.420 | So these lateral connections

00:50:18.860 | are enforcing the transformations.

00:50:22.020 | - Is the whole thing still differentiable?

00:50:24.780 | - No.

00:50:25.620 | - Okay.

00:50:26.460 | - No.

00:50:27.300 | (laughs)

00:50:28.140 | It's not trained using a backprop.

00:50:30.260 | - Okay, that's really important.

00:50:31.540 | So there's these feed-forward,

00:50:33.940 | there's feedback mechanisms.

00:50:35.940 | There's some interesting connectivity things.

00:50:37.780 | It's still layered?

00:50:38.940 | Like-

00:50:39.780 | - Yes, there are multiple levels.

00:50:41.420 | - Multiple layers.

00:50:43.500 | Okay, very, very interesting.

00:50:45.860 | And yeah, okay, so the interconnection between adjacent,

00:50:49.900 | so connections across service constraints

00:50:53.100 | that keep the thing stable.

00:50:55.340 | - Correct.

00:50:56.260 | - Okay, so what else?

00:50:58.300 | - And then there's this idea of doing inference.

00:51:01.100 | A neural network does not do inference on the fly.

00:51:05.500 | So an example of why this inference is important is,

00:51:09.220 | so one of the first applications

00:51:11.780 | that we showed in the paper was to crack text-based captures.

00:51:16.780 | - What are captures, by the way?

00:51:19.180 | (laughs)

00:51:20.020 | - Yeah.

00:51:20.860 | - By the way, one of the most awesome,

00:51:22.140 | like the people don't use this term anymore,

00:51:24.140 | it's human computation, I think.

00:51:25.780 | I love this term.

00:51:28.020 | The guy who created captures,

00:51:29.620 | I think came up with this term.

00:51:30.860 | - Yeah.

00:51:31.700 | - I love it.

00:51:32.540 | Anyway, what are captures?

00:51:35.580 | - So captures are those strings that you fill in

00:51:39.860 | when you're, you know,

00:51:40.820 | if you're opening a new account in Google,

00:51:43.300 | they show you a picture, you know,

00:51:45.380 | usually it used to be set of garbled letters

00:51:48.700 | that you have to kind of figure out

00:51:50.660 | what is that string of characters and type it.

00:51:53.340 | And the reason captures exist is because, you know,

00:51:57.740 | Google or Twitter do not want

00:52:00.980 | automatic creation of accounts.

00:52:02.660 | You can use a computer to create millions of accounts

00:52:06.540 | and use that for nefarious purposes.

00:52:10.740 | So you want to make sure that, to the extent possible,

00:52:13.860 | the interaction that their system is having

00:52:16.940 | is with a human.

00:52:18.020 | So it's called a human interaction proof.

00:52:20.900 | A capture is a human interaction proof.

00:52:23.540 | So this is, captures are by design,

00:52:27.340 | things that are easy for humans to solve,

00:52:29.820 | but hard for computers.

00:52:30.820 | - Hard for robots, yeah.

00:52:31.860 | - Yeah.

00:52:32.780 | So, and text-based captures was the one which is prevalent

00:52:37.780 | until around 2014,

00:52:39.780 | because at that time, text-based captures

00:52:42.260 | were hard for computers to crack.

00:52:44.740 | Even now, they are actually,

00:52:46.780 | in the sense of an arbitrary text-based capture

00:52:49.780 | will be unsolvable even now.

00:52:51.740 | But with the techniques that we have developed,

00:52:53.940 | it can be, you know, you can quickly develop a mechanism

00:52:56.380 | that solves the capture.

00:52:58.940 | - They've probably gotten a lot harder too.

00:53:00.740 | The people, they've been getting cleverer and cleverer

00:53:03.860 | generating these text captures.

00:53:05.380 | - Correct, correct.

00:53:06.300 | - So, okay, so that was one of the things you've tested it on

00:53:09.420 | is these kinds of captures in 2014, '15, that kind of stuff.

00:53:13.740 | So what, I mean, by the way, why captures?

00:53:18.500 | - Yeah, yeah.

00:53:19.740 | Even now, I would say capture

00:53:21.420 | is a very, very good challenge problem.

00:53:23.580 | If you want to understand how human perception works

00:53:27.380 | and if you want to build systems that work

00:53:30.500 | like the human brain.

00:53:32.300 | And I wouldn't say capture is a solved problem.

00:53:34.860 | We have cracked the fundamental defense of captures,

00:53:37.620 | but it is not solved in the way that humans solve it.

00:53:41.980 | So I can give an example.

00:53:43.420 | I can take a five-year-old child

00:53:46.500 | who has just learned characters

00:53:48.860 | and show them any new capture that we create.

00:53:53.220 | They will be able to solve it.

00:53:54.740 | I can show you pretty much any new capture

00:53:58.980 | from any new website.

00:54:00.620 | You'll be able to solve it

00:54:01.580 | without getting any training examples

00:54:04.060 | from that particular style of capture.

00:54:05.860 | - You're assuming I'm human, yeah.

00:54:07.180 | - Yes, yeah, that's right.

00:54:10.020 | So if you are human,

00:54:11.100 | otherwise I will be able to figure that out using this one.

00:54:15.740 | - This whole podcast is just a touring test,

00:54:19.220 | a long touring test.

00:54:20.980 | Anyway, I'm sorry.

00:54:21.820 | So yeah, so humans can figure it out

00:54:24.700 | with very few examples.

00:54:26.260 | - Or no training examples.

00:54:28.220 | No training examples from that particular style of capture.

00:54:31.180 | And so even now this is unreachable

00:54:35.980 | for the current deep learning system.

00:54:38.220 | So basically there is no,

00:54:39.860 | I don't think a system exists

00:54:41.220 | where you can basically say,

00:54:42.460 | train on whatever you want.

00:54:44.300 | And then now say,

00:54:45.740 | hey, I will show you a new capture,

00:54:47.660 | which I did not show you in the training setup.

00:54:50.940 | Will the system be able to solve it?

00:54:52.740 | Still doesn't exist.

00:54:54.820 | So that is the magic of human perception.

00:54:58.020 | And Doug Hofstadter put this very beautifully

00:55:02.180 | in one of his talks.

00:55:04.700 | The central problem in AI is what is the letter A?

00:55:10.500 | If you can build a system that reliably can detect

00:55:15.140 | all the variations of the letter A,

00:55:16.980 | you don't even need to go to the...

00:55:18.780 | - The B and the C.

00:55:20.860 | - Yeah, you don't even need to go to the B and the C

00:55:22.620 | or the strings of characters.

00:55:24.140 | And so that is the spirit at which,

00:55:27.140 | with which we tackle that problem.

00:55:28.820 | - What does he mean by that?

00:55:30.060 | I mean, is it like without training examples,

00:55:34.900 | try to figure out the fundamental elements

00:55:39.020 | that make up the letter A in all of its forms?

00:55:43.620 | - In all of its forms.

00:55:44.460 | It can be, A can be made with two humans standing,

00:55:47.060 | leaning against each other, holding the hands.

00:55:49.420 | And it can be made of leaves.

00:55:51.580 | It can be...

00:55:52.420 | - Yeah, you might have to understand everything

00:55:54.780 | about this world in order to understand the letter A.

00:55:57.100 | - Exactly.

00:55:58.100 | - So it's common sense reasoning, essentially.

00:56:00.540 | - Right.

00:56:01.380 | So to finally, to really solve,

00:56:03.700 | finally to say that you have solved capture,

00:56:07.980 | you have to solve the whole problem.

00:56:09.580 | (both laughing)

00:56:11.260 | - Yeah, okay.

00:56:12.100 | So how does this kind of the RCN architecture

00:56:15.780 | help us to get, do a better job of that kind of thing?

00:56:18.940 | - Yeah, so as I mentioned,

00:56:21.260 | one of the important things was being able to do inference,

00:56:25.100 | being able to dynamically do inference.

00:56:27.220 | - Can you clarify what you mean?

00:56:30.700 | 'Cause you said like neural networks don't do inference.

00:56:33.180 | - Yeah.

00:56:34.020 | - So what do you mean by inference in this context then?

00:56:35.940 | - So, okay, so in captures,

00:56:38.380 | what they do to confuse people

00:56:40.380 | is to make these characters crowd together.

00:56:43.460 | - Yes.

00:56:44.300 | - Okay, and when you make the characters crowd together,

00:56:46.740 | what happens is that you will now start seeing

00:56:49.340 | combinations of characters as some other new character

00:56:52.340 | or an existing character.

00:56:53.780 | So you would put an R and N together,

00:56:56.820 | it will start looking like an M.

00:56:59.140 | And so locally, there is very strong evidence

00:57:03.660 | for it being some incorrect character.

00:57:08.660 | But globally, the only explanation that fits together

00:57:12.620 | is something that is different

00:57:14.260 | from what you can find locally.

00:57:16.140 | - Yes.

00:57:16.980 | - So this is inference.

00:57:18.860 | You are basically taking local evidence

00:57:22.260 | and putting it in the global context

00:57:24.620 | and often coming to a conclusion locally,

00:57:27.780 | which is conflicting with the local information.

00:57:30.100 | - So actually, so you mean inference like

00:57:33.940 | in the way it's used when you talk about reasoning,

00:57:36.660 | for example, as opposed to like inference,

00:57:39.420 | which is with artificial neural networks,

00:57:42.380 | which is a single pass to the network.

00:57:44.020 | - Correct.

00:57:44.860 | - Okay.

00:57:45.700 | So like you're basically doing some basic forms of reasoning

00:57:48.740 | like integration of like how local things fit

00:57:52.780 | into the global picture.

00:57:54.140 | - And things like explaining away coming into this one

00:57:56.940 | because you are explaining that piece of evidence

00:58:01.420 | as something else because globally,

00:58:04.220 | that's the only thing that makes sense.

00:58:06.180 | So now you can amortize this inference by,

00:58:11.180 | you know, in a neural network,

00:58:13.260 | if you want to do this, you can brute force it.

00:58:16.260 | You can just show it all combinations of things

00:58:19.460 | that you want your reasoning to work over.

00:58:23.260 | And you can, you know, like just train the hell

00:58:26.140 | out of that neural network.

00:58:27.420 | And it will look like it is doing, you know,

00:58:30.540 | inference on the fly,

00:58:31.460 | but it is really just doing amortized inference.

00:58:35.020 | It is because you have shown it a lot of these combinations

00:58:38.620 | during training time.

00:58:40.460 | So what you want to do is be able to do dynamic inference

00:58:44.540 | rather than just being able to show all those combinations

00:58:47.420 | in the training time.

00:58:48.620 | And that's something we emphasized in the model.

00:58:51.780 | - What does it mean dynamic inference?

00:58:53.900 | Is that that has to do with the feedback thing?

00:58:56.300 | - Yes.

00:58:57.140 | - Like what is dynamic?

00:58:58.740 | I'm trying to visualize what dynamic inference

00:59:01.500 | would be in this case.

00:59:02.500 | Like, what is it doing with the input?

00:59:05.180 | It's shown the input the first time.

00:59:07.780 | - Yeah.

00:59:08.620 | - And it's like, what's changing over temporarily?

00:59:12.420 | What's the dynamics of this inference process?

00:59:14.860 | - So you can think of it as you have at the top of the model,

00:59:18.900 | the characters that you are trained on,

00:59:21.220 | they are the causes.

00:59:23.020 | You're trying to explain the pixels

00:59:25.780 | using the characters as the causes.

00:59:28.900 | The characters are the things that cause the pixels.

00:59:32.260 | - Yeah, so there's this causality thing.

00:59:34.980 | So the reason you mentioned causality, I guess,

00:59:37.780 | is because there's a temporal aspect to this whole thing.

00:59:40.860 | - In this particular case,

00:59:41.900 | the temporal aspect is not important.

00:59:43.380 | It is more like when, if I turn the character on,

00:59:47.740 | the pixels will turn on.

00:59:49.180 | Yeah, it will be after this a little bit, but yeah.

00:59:51.060 | - Okay, so it's causality in the sense

00:59:53.300 | of like a logic causality, like hence inference, okay.

00:59:57.540 | - The dynamics is that even though locally,

01:00:01.340 | it will look like, okay, this is an A.

01:00:03.860 | And locally, just when I look at just that patch of the image,

01:00:09.140 | it looks like an A.

01:00:10.660 | But when I look at it in the context of all the other causes,

01:00:14.500 | it might not, A is not something that makes sense.

01:00:17.380 | So that is something you have to kind of,

01:00:19.220 | you know, recursively figure out.

01:00:20.660 | - Yeah, so, okay, so, and this thing performed pretty well

01:00:25.340 | on the CAPTCHAs.

01:00:26.260 | - Correct.

01:00:27.100 | - And I mean, is there some kind of interesting intuition

01:00:32.100 | you can provide why it did well?

01:00:34.100 | Like, what did it look like?

01:00:36.060 | Is there visualizations that could be human interpretable

01:00:38.620 | to us humans?

01:00:39.740 | - Yes, yeah, so the good thing about the model

01:00:41.980 | is that it is extremely,

01:00:44.500 | so it is not just doing a classification, right?

01:00:46.620 | It is providing a full explanation for the scene.

01:00:50.540 | So when it operates on a scene,

01:00:55.180 | it is coming at back and saying,

01:00:57.100 | look, this is the part is the A,

01:00:59.780 | and these are the pixels that turned on,

01:01:02.140 | these are the pixels in the input that tells,

01:01:05.700 | makes me think that it is an A.

01:01:08.100 | And also these are the portions I hallucinated.

01:01:10.940 | It provides a complete explanation of that form.

01:01:15.340 | And then these are the contours, this is the interior,

01:01:19.660 | and this is in front of this other object.

01:01:22.660 | So that's the kind of explanation

01:01:25.700 | the inference network provides.

01:01:28.500 | So that is useful and interpretable.

01:01:32.140 | And then the kind of errors it makes are also,

01:01:38.460 | I don't want to read too much into it,

01:01:42.980 | but the kind of errors the network makes

01:01:45.220 | are very similar to the kinds of errors

01:01:48.180 | humans would make in a similar situation.

01:01:50.420 | - So there's something about the structure

01:01:51.940 | that feels reminiscent of the way

01:01:54.420 | humans' visual system works.

01:01:58.900 | Well, I mean, how hard-coded is this

01:02:01.900 | to the capture problem, this idea?

01:02:04.420 | - Not really hard-coded because it's the,

01:02:06.820 | the assumptions, as I mentioned, are general, right?

01:02:09.340 | It is more, and those themselves can be applied

01:02:13.300 | in many situations which are natural signals.

01:02:17.180 | So it's the foreground versus background factorization

01:02:20.700 | and the factorization of the surfaces versus the contours.

01:02:25.460 | So these are all generally applicable assumptions.

01:02:27.700 | - In all vision.

01:02:29.060 | So why capture, why attack the capture problem,

01:02:34.060 | which is quite unique in the computer vision context

01:02:36.700 | versus like the traditional benchmarks of ImageNet

01:02:40.900 | and all those kinds of image classification

01:02:43.740 | or even segmentation tasks and all that kind of stuff.

01:02:46.180 | Do you feel like that's, I mean,

01:02:48.340 | what's your thinking about those kinds of benchmarks

01:02:50.780 | in this context?

01:02:53.540 | - I mean, those benchmarks are useful

01:02:55.220 | for deep learning kind of algorithms where you,

01:02:58.620 | so the settings that deep learning works in are,

01:03:02.300 | here is my huge training set and here is my test set.

01:03:05.940 | So the training set is almost 100x, 1000x bigger

01:03:10.420 | than the test set in many cases.

01:03:14.140 | What we wanted to do was invert that.

01:03:17.500 | The training set is way smaller than the test set.

01:03:21.860 | - Yes.

01:03:22.700 | - And, you know, capture is a problem that is by definition

01:03:27.700 | hard for computers and it has these good properties

01:03:33.100 | of strong generalization, strong out of training

01:03:36.820 | distribution generalization.

01:03:38.180 | If you are interested in studying that and putting,

01:03:42.580 | having your model have that property,

01:03:44.580 | then it's a good data set to tackle.

01:03:46.820 | - So is there, have you attempted to,

01:03:49.300 | which I think, I believe there's quite a growing body

01:03:52.460 | of work on looking at MNIST and ImageNet without training.

01:03:57.460 | So like taking like the basic challenges,

01:04:01.460 | how, what tiny fraction of the training set can we take

01:04:05.760 | in order to do a reasonable job of the classification task?

01:04:10.460 | Have you explored that angle in these classic benchmarks?

01:04:15.020 | - Yes, so we did do MNIST.

01:04:16.580 | So, you know, so it's not just capture.

01:04:19.380 | So there was also versions of, multiple versions of MNIST,

01:04:24.380 | including the standard version,

01:04:27.340 | which where we inverted the problem,

01:04:28.940 | which is basically saying rather than train

01:04:30.780 | on 60,000 training data, you know,

01:04:33.700 | how quickly can you get to high level accuracy

01:04:38.500 | with very little training data?

01:04:40.060 | - Is there some performance that you remember,

01:04:42.220 | like how well, how well did it do?

01:04:45.260 | How many examples did it need?

01:04:47.420 | - Yeah, I, you know, I remember that it was,

01:04:51.260 | you know, on the order of tens or hundreds of examples

01:04:56.260 | to get into 95% accuracy.

01:04:59.580 | And it was, it was definitely better than the systems,

01:05:02.180 | other systems out there at that time.

01:05:03.940 | - At that time. - Yeah.

01:05:04.820 | - Yeah, they're really pushing it.

01:05:05.900 | I think that's a really interesting space, actually.

01:05:09.340 | I think there's an actual name for MNIST that,

01:05:14.340 | like there's different names

01:05:16.740 | for the different sizes of training sets.

01:05:19.220 | I mean, people are like attacking this problem.

01:05:21.260 | I think it's super interesting.

01:05:22.900 | It's funny how like the MNIST will probably be with us

01:05:27.060 | all the way to AGI.

01:05:28.660 | - Yes. (laughs)

01:05:29.900 | - It's a data set that just sticks by.

01:05:31.700 | It is, it's a clean, simple data set

01:05:35.860 | to study the fundamentals of learning

01:05:38.140 | with just like CAPTCHAs, it's interesting.

01:05:40.460 | Not enough people, I don't know, maybe you can correct me,

01:05:43.900 | but I feel like CAPTCHAs don't show up as often in papers

01:05:47.140 | as they probably should.

01:05:48.380 | - That's correct, yeah.

01:05:49.540 | Because, you know, usually these things have a momentum,

01:05:53.700 | you know, once something gets established

01:05:57.660 | as a standard benchmark, there is a dynamics

01:06:02.340 | of how graduate students operate

01:06:04.660 | and how academic system works

01:06:07.420 | that pushes people to track that benchmark.

01:06:10.860 | - Yeah, to focus.

01:06:12.140 | - Yeah.

01:06:12.980 | - Nobody wants to think outside the box, okay.

01:06:15.540 | - Yes.

01:06:16.780 | - Okay, so good performance on the CAPTCHAs.

01:06:20.700 | What else is there interesting on the RCN side

01:06:23.940 | before we talk about the cortical microscope?

01:06:25.700 | - Yeah, so the same model,

01:06:27.780 | so the important part of the model

01:06:30.620 | was that it trains very quickly

01:06:32.260 | with very little training data.

01:06:33.860 | And it's quite robust to out-of-distribution perturbations.

01:06:38.860 | And we are using that very fruitfully

01:06:44.300 | and advocatiously in many of the robotics tasks

01:06:47.260 | we are solving.

01:06:48.100 | - You're solving that.

01:06:48.940 | Well, let me ask you this kind of touchy question.

01:06:51.940 | I have to, I've spoken with your friend, colleague,

01:06:56.100 | Jeff Hawkins, too.

01:06:56.940 | I mean, I have to kind of ask, there is a bit of,

01:07:02.100 | whenever you have brain-inspired stuff

01:07:04.740 | and you make big claims, big sexy claims,

01:07:08.320 | there's critics, I mean, machine learning subreddit.

01:07:14.660 | Don't get me started on those people.

01:07:16.500 | I mean, criticism is good, but they're a bit over the top.

01:07:21.460 | There is quite a bit of sort of skepticism and criticism.

01:07:26.660 | Is this work really as good as it promises to be?

01:07:30.540 | Do you have thoughts on that kind of skepticism?

01:07:34.860 | Do you have comments on the kind of criticism

01:07:36.780 | we might've received about, is this approach legit?

01:07:41.460 | Is this a promising approach?

01:07:44.620 | Or at least as promising as it seems to be advertised as?

01:07:49.620 | - Yeah, I can comment on it.

01:07:51.700 | So our Arsene paper is published in Science,

01:07:55.300 | which I would argue is a very high quality journal,

01:07:58.420 | very hard to publish in.

01:08:00.260 | And usually it is indicative of the quality of the work.

01:08:04.540 | And I am very, very certain that the ideas

01:08:09.540 | that we brought together in that paper

01:08:13.420 | in terms of the importance of feedback connections,

01:08:16.580 | recursive inference, lateral connections,

01:08:19.260 | coming to best explanation of the scene

01:08:21.940 | as the problem to solve,

01:08:23.500 | trying to solve recognition, segmentation, all jointly

01:08:28.780 | in a way that is compatible with higher level cognition,

01:08:31.420 | top-down attention, all those ideas

01:08:33.100 | that we brought together into something coherent

01:08:35.460 | and workable in the world and tackling a challenging problem,

01:08:39.860 | I think that will stay and that contribution I stand by.

01:08:44.300 | Now, I can tell you a story which is funny

01:08:48.340 | in the context of this.

01:08:49.980 | So if you read the abstract of the paper

01:08:51.980 | and the argument we are putting in,

01:08:55.140 | look, current deep learning systems

01:08:56.780 | take a lot of training data.

01:08:58.780 | They don't use these insights.

01:09:00.700 | And here is our new model,

01:09:02.540 | which is not a deep neural network, it's a graphical model.

01:09:04.700 | It does inference.

01:09:05.540 | This is what the paper is, right?

01:09:07.380 | Now, once the paper was accepted and everything,

01:09:10.020 | it went to the press department in Science,

01:09:13.900 | AAAS Science Office.

01:09:15.180 | We didn't do any press release when it was published.

01:09:17.340 | It went to the press department.

01:09:19.060 | What was the press release that they wrote up?

01:09:21.900 | A new deep learning model. (laughs)

01:09:24.900 | - Solves CAPTCHAs.

01:09:26.100 | - Solves CAPTCHAs.

01:09:27.260 | And so you can see what was being hyped in that thing.

01:09:32.260 | So it's like there is a dynamic in the community.

01:09:37.780 | That especially happens when there are lots of new people

01:09:43.940 | coming into the field and they get attracted to one thing.

01:09:46.860 | And some people are trying to think different

01:09:49.820 | compared to that.

01:09:50.660 | So there is some, I think skepticism in science

01:09:53.820 | is important and it is very much required.

01:09:58.460 | But it's also, it's not skepticism usually,

01:10:01.540 | it's mostly bandwagon effect that is happening

01:10:04.060 | rather than-

01:10:04.900 | - But that's not even that.

01:10:06.980 | I mean, I'll tell you what they react to,

01:10:09.060 | which is like, I'm sensitive to as well.

01:10:12.220 | If you look at just companies, OpenAI, DeepMind,

01:10:15.220 | Vicarious, I mean, there's a little bit of a race

01:10:23.260 | to the top and hype, right?

01:10:25.500 | It's like, it doesn't pay off to be humble.

01:10:29.740 | (laughs)

01:10:31.940 | So like, and the press is just irresponsible often.

01:10:36.940 | They just, I mean, don't get me started

01:10:38.900 | on the state of journalism today.

01:10:41.380 | Like, it seems like the people who write articles

01:10:43.580 | about these things, they literally have not even spent

01:10:47.620 | an hour on the Wikipedia article

01:10:49.500 | about what is neural networks.

01:10:51.780 | They haven't invested just even the language to laziness.

01:10:56.780 | It's like, robots beat humans.

01:11:01.900 | Like, they write this kind of stuff that just,

01:11:06.020 | and then of course the researchers are quite sensitive

01:11:09.300 | to that because it gets a lot of attention.

01:11:12.260 | They're like, why did this word get so much attention?

01:11:15.020 | That's over the top and people get really sensitive.

01:11:19.260 | - The same kind of criticism with,

01:11:21.540 | OpenAI did work with Rubik's Cube with the robot

01:11:24.540 | that people criticized.

01:11:26.700 | Same with GPT-2 and 3, they criticize.

01:11:30.220 | Same thing with DeepMinds with AlphaZero.

01:11:33.700 | I mean, yeah, I'm sensitive to it, but,

01:11:37.340 | and of course with your work, you mentioned deep learning,

01:11:40.260 | but there's something super sexy to the public

01:11:43.460 | about brain-inspired.

01:11:45.540 | I mean, that immediately grabs people's imagination.

01:11:48.440 | Not even like neural networks, but like really brain-inspired.

01:11:53.440 | - Got it.

01:11:54.300 | - Like brain-like neural networks.

01:11:57.500 | That seems really compelling to people and to me as well,

01:12:01.180 | to the world as a narrative.

01:12:03.500 | And so people hook up, hook onto that,

01:12:07.580 | and sometimes the skepticism engine turns on

01:12:12.300 | in the research community and they're skeptical.

01:12:15.040 | But I think putting aside the ideas

01:12:19.060 | of the actual performance on CAPTCHAs

01:12:20.900 | or performance on any dataset,

01:12:22.940 | I mean, to me, all these datasets are useless anyway.

01:12:26.820 | It's nice to have them, but in the grand scheme of things,

01:12:30.220 | they're silly toy examples.

01:12:32.180 | The point is, is there intuition about the ideas,

01:12:37.120 | just like you mentioned,

01:12:38.540 | bringing the ideas together in a unique way?

01:12:41.540 | Is there something there?

01:12:42.780 | Is there some value there?

01:12:44.180 | And is it gonna stand the test of time?

01:12:45.920 | - Yes.

01:12:46.760 | - And that's the hope.

01:12:47.580 | - Yes.

01:12:48.420 | - That's the hope.

01:12:49.320 | - My confidence there is very high.

01:12:51.680 | I don't treat brain-inspired as a marketing term.

01:12:55.040 | I am looking into the details of biology

01:13:00.080 | and puzzling over those things,

01:13:03.320 | and I am grappling with those things.

01:13:05.840 | And so it is not a marketing term at all.

01:13:08.920 | You can use it as a marketing term,

01:13:10.440 | and people often use it.

01:13:12.240 | And you can get combined with them.

01:13:14.500 | And when people don't understand

01:13:16.700 | how we are approaching the problem,

01:13:18.180 | it is easy to be misunderstood

01:13:21.320 | and think of it as purely marketing,

01:13:24.300 | but that's not the way we are.

01:13:26.220 | - So you really, I mean, as a scientist,

01:13:30.100 | you believe that if we kinda just stick

01:13:33.240 | to really understanding the brain,

01:13:35.380 | that's the right, like you should constantly meditate

01:13:39.740 | on the how does the brain do this?

01:13:42.580 | 'Cause that's going to be really helpful

01:13:44.240 | for engineering intelligence systems.

01:13:46.560 | - Yes, you need to, so I think it's one input,

01:13:49.800 | and it is helpful, but you should know

01:13:53.820 | when to deviate from it too.

01:13:56.280 | So an example is convolutional neural networks, right?

01:13:59.680 | Convolution is not an operation brain implements.

01:14:04.280 | The visual cortex is not convolutional.

01:14:07.400 | Visual cortex has local receptive fields,

01:14:10.180 | local connectivity, but there is no translation

01:14:15.180 | in invariance in the network weights in the visual cortex.

01:14:21.180 | That is a computational trick,

01:14:25.720 | which is a very good engineering trick

01:14:27.200 | that we use for sharing the training

01:14:29.780 | between the different nodes.

01:14:31.780 | So, and that trick will be with us for some time.

01:14:35.580 | It will go away when we have robots

01:14:38.180 | with eyes and heads that move.

01:14:43.460 | And so then that trick will go away.

01:14:45.460 | It will not be useful at that time.

01:14:47.900 | So--

01:14:48.740 | - So the brain doesn't have translational invariance.

01:14:53.060 | It has the focal point, like it has a thing it focuses on.

01:14:56.020 | - Correct, it has a fovea, and because of the fovea,

01:14:59.480 | the receptive fields are not like the copying of the weights,

01:15:04.140 | like the weights in the center are very different

01:15:06.660 | from the weights in the periphery.

01:15:07.820 | - Yes, at the periphery.

01:15:08.780 | I mean, I did this, actually wrote a paper

01:15:12.540 | and just gotten a chance to really study peripheral vision,

01:15:16.820 | which is a fascinating thing.

01:15:19.060 | Very under understood thing of what the,

01:15:24.060 | at every level the brain does with the periphery.

01:15:28.140 | It does some funky stuff.

01:15:29.740 | So it's another kind of trick than convolutional.

01:15:34.740 | Like it does, it's, you know, convolutional,

01:15:39.900 | convolution in neural networks is a trick for efficiency,

01:15:43.700 | is an efficiency trick.

01:15:45.180 | And the brain does a whole nother kind of thing, I guess.

01:15:47.540 | - Correct, correct.

01:15:48.420 | So you need to understand the principles

01:15:51.180 | of processing so that you can still apply engineering tricks

01:15:55.740 | where you want it to.

01:15:56.580 | You don't want to be slavishly mimicking

01:15:58.500 | all the things of the brain.

01:16:00.580 | And so, yeah, so it should be one input

01:16:02.580 | and I think it is extremely helpful,

01:16:05.100 | but it should be the point of really understanding

01:16:08.420 | so that you know when to deviate from it.

01:16:10.860 | - So, okay, that's really cool.

01:16:13.260 | That's work from a few years ago.

01:16:15.860 | So you did work in Numenta with Jeff Hawkins

01:16:19.260 | with hierarchical temporal memory.

01:16:23.340 | How is your just, if you could just give a brief history,

01:16:27.340 | how is your view of the way the models of the brain changed

01:16:31.820 | over the past few years leading up to now?

01:16:35.660 | Is there some interesting aspects

01:16:37.360 | where there was an adjustment

01:16:40.060 | to your understanding of the brain

01:16:41.660 | or is it all just building on top of each other?

01:16:44.020 | - In terms of the higher level ideas,

01:16:46.140 | especially the ones Jeff wrote about in the book,

01:16:49.140 | if you blur out, right, you know.

01:16:51.140 | - Yeah, on intelligence.

01:16:52.380 | - Right, on intelligence.

01:16:53.380 | If you blur out the details and if you just zoom out

01:16:56.580 | and at the higher level idea,

01:16:57.980 | things are, I would say, consistent

01:17:00.820 | with what he wrote about,

01:17:01.860 | but many things will be consistent with that

01:17:04.340 | because it's a blur, you know, when you,

01:17:07.100 | deep learning systems are also, you know,

01:17:09.260 | multi-level, hierarchical, all of those things, right?

01:17:12.020 | But in terms of the detail, a lot of things are different

01:17:18.620 | and those details matter a lot.

01:17:22.300 | So one point of difference I had with Jeff

01:17:26.860 | was how to approach, you know,

01:17:29.900 | how much of biological plausibility and realism

01:17:33.740 | do you want in the learning algorithms?

01:17:35.740 | So when I was there, this was, you know,

01:17:41.100 | almost 10 years ago now, so--

01:17:42.500 | - Yeah, flies when you're having fun.

01:17:44.540 | - I don't know what Jeff thinks now,

01:17:46.540 | but 10 years ago, the difference was that

01:17:49.940 | I did not want to be so constrained on saying,

01:17:54.820 | my learning algorithms need to be biologically plausible

01:17:58.380 | based on some filter of biological plausibility

01:18:02.020 | available at that time.

01:18:03.380 | To me, that is a dangerous cut to make

01:18:05.940 | because we are, you know,

01:18:08.060 | discovering more and more things about the brain

01:18:09.620 | all the time.

01:18:10.580 | New biophysical mechanisms, new channels

01:18:13.740 | are being discovered all the time.

01:18:15.500 | So I don't want to upfront kill off a learning algorithm

01:18:20.180 | just because we don't really understand

01:18:22.620 | the full biophysics or whatever of how the brain learns.

01:18:27.620 | - Exactly, exactly.

01:18:28.900 | - But let me ask, and I'm sorry to interrupt,

01:18:30.900 | like, what's your sense, what's our best understanding

01:18:34.740 | of how the brain learns?

01:18:36.620 | - So things like back propagation, credit assignment,

01:18:40.780 | so many of these algorithms have,

01:18:43.340 | learning algorithms have things in common, right?

01:18:45.540 | It is, back propagation is one way of credit assignment.

01:18:49.380 | There is another algorithm called expectation maximization,

01:18:52.660 | which is, you know, another weight adjustment algorithm.

01:18:56.100 | - But is it your sense the brain does something like this?

01:18:58.980 | - Has to, there is no way around it

01:19:01.380 | in the sense of saying that you do have to adjust

01:19:04.380 | the connections.

01:19:06.340 | - So, and you're saying credit assignment,

01:19:08.020 | you have to reward the connections that were useful

01:19:10.180 | in making a correct prediction and not,

01:19:12.620 | yeah, I guess, but yeah, it doesn't have to be differentiable.

01:19:17.620 | - Yeah, it doesn't have to be differentiable.

01:19:19.060 | - Yeah, but you have to have a, you know,

01:19:22.260 | you have a model that you start with,

01:19:24.380 | you have data comes in, and you have to have a way

01:19:27.700 | of adjusting the model such that it better fits the data.

01:19:32.220 | So that is all of learning, right?

01:19:34.660 | And some of them can be using backprop to do that.

01:19:38.020 | Some of it can be using, you know,

01:19:40.540 | very local graph changes to do that.

01:19:45.500 | That can, you know, many of these learning algorithms

01:19:48.780 | have similar update properties locally

01:19:52.820 | in terms of what the neurons need to do locally.

01:19:57.340 | - I wonder if small differences in learning algorithms

01:19:59.820 | can have huge differences in the actual effect.

01:20:02.380 | So the dynamics of, I mean, sort of the reverse,

01:20:07.220 | like spiking, like if credit assignment is like a lightning

01:20:13.100 | versus like a rainstorm or something,

01:20:15.820 | like whether there's like a looping local type of situation

01:20:20.820 | with the credit assignment,

01:20:25.580 | whether there is like regularization,

01:20:29.460 | like how it injects robustness into the whole thing,

01:20:34.460 | like whether it's chemical or electrical or mechanical,

01:20:40.660 | all those kinds of things.

01:20:42.620 | - Yes.

01:20:43.460 | - I feel like it, that, yeah.

01:20:47.100 | I feel like those differences could be essential, right?

01:20:49.540 | - It could be.

01:20:50.380 | It's just that you don't know enough to,

01:20:53.980 | on the learning side, you don't know enough to say

01:20:58.460 | that is definitely not the way the brain does it.

01:21:01.180 | - Got it.

01:21:02.020 | So you don't want to be stuck to it.

01:21:02.980 | - Right.

01:21:03.820 | - So that, yeah.

01:21:04.660 | So you've been open-minded on that side of things.

01:21:07.060 | - Correct.

01:21:07.900 | On the inference side, on the recognition side,

01:21:09.220 | I am much more amenable to being constrained

01:21:12.420 | because it's much easier to do experiments

01:21:14.900 | because it's like, okay, here's the stimulus.

01:21:17.620 | How many steps did it get to take the answer?

01:21:19.780 | I can trace it back.

01:21:21.260 | I can understand the speed of that computation, et cetera,

01:21:24.860 | much more readily on the inference side.

01:21:27.900 | - Got it.

01:21:28.740 | And then you can't do good experiments on the learning side.

01:21:30.860 | - Correct.

01:21:31.700 | - So let's go right into cortical microcircuits right back.

01:21:37.500 | So what are these ideas

01:21:40.500 | beyond recursive cortical network that you're looking at now?

01:21:45.260 | - So we have made a pass through multiple of the steps

01:21:49.580 | that as I mentioned earlier,

01:21:52.700 | we were looking at perception from the angle of cognition.

01:21:56.340 | It was not just perception for perception's sake.

01:21:58.860 | How do you connect it to cognition?

01:22:01.620 | How do you learn concepts?

01:22:03.020 | And how do you learn abstract reasoning?

01:22:05.820 | Similar to some of the things Francois talked about.

01:22:09.980 | So we have taken one pass through it,

01:22:14.980 | basically saying,

01:22:16.420 | what is the basic cognitive architecture

01:22:19.540 | that you need to have,

01:22:21.020 | which has a perceptual system,

01:22:23.060 | which has a system that learns dynamics of the world,

01:22:25.980 | and then has something like a routine,

01:22:29.900 | program learning system on top of it to learn concepts.

01:22:33.220 | So we have built one, the version 0.1 of that system.

01:22:38.300 | This was another science robotics paper.

01:22:41.500 | It's the title of that paper was,

01:22:44.340 | something like cognitive programs.

01:22:45.860 | How do you build cognitive programs?

01:22:48.180 | And-

01:22:49.020 | - And the application there was on manipulation,

01:22:52.580 | robotic manipulation?

01:22:53.420 | - It was, so think of it like this.

01:22:56.100 | Suppose you wanted to tell a new person that you met,

01:23:01.100 | you don't know the language that person uses.

01:23:04.980 | You want to communicate to that person

01:23:07.340 | to achieve some task, right?

01:23:09.100 | So I want to say,

01:23:10.220 | hey, you need to pick up all the red cups

01:23:14.300 | from the kitchen counter and put it here, right?

01:23:17.300 | How do you communicate that, right?

01:23:19.300 | You can show pictures.

01:23:21.060 | You can basically say, look, this is the starting state.

01:23:23.580 | The things are here, this is the ending state.

01:23:27.060 | And what does the person need to understand from that?

01:23:30.020 | The person need to understand

01:23:31.540 | what conceptually happened in those pictures

01:23:33.700 | from the input to the output, right?

01:23:35.940 | So we are looking at pre-verbal conceptual understanding.

01:23:40.940 | Without language, how do you have a set of concepts

01:23:45.540 | that you can manipulate in your head?

01:23:48.300 | And from a set of images of input and output,

01:23:52.340 | can you infer what is happening in those images?

01:23:55.980 | - Got it, with concepts that are pre-language, okay.

01:23:59.140 | So what's it mean for a concept to be pre-language?

01:24:02.500 | Like, why is language so important here?

01:24:07.500 | - So I want to make a distinction between concepts

01:24:13.660 | that are just learned from text,

01:24:16.700 | by just feeding brute force text.

01:24:19.860 | You can start extracting things like, okay,

01:24:24.580 | cow is likely to be on grass.

01:24:27.420 | So those kinds of things you can extract purely from text.

01:24:32.660 | But that's kind of a simple association thing

01:24:35.860 | rather than a concept as an abstraction of something

01:24:39.060 | that happens in the real world, in a grounded way,

01:24:43.140 | that I can simulate it in my mind

01:24:45.780 | and connect it back to the real world.

01:24:48.620 | - And you think kind of the visual world,

01:24:52.180 | concepts in the visual world are somehow lower level

01:24:56.620 | than just the language?

01:24:58.940 | - The lower level kind of makes it feel like,

01:25:00.980 | okay, that's unimportant.

01:25:02.660 | Like, it's more like, I would say the concepts

01:25:07.660 | in the visual and motor system

01:25:12.460 | and the concept learning system,

01:25:15.580 | which if you cut off the language part,

01:25:17.660 | just what we learn by interacting with the world

01:25:20.460 | and abstractions from that,

01:25:21.940 | that is a prerequisite for any real language understanding.

01:25:26.580 | - So you disagree with Chomsky,

01:25:29.260 | 'cause he says language is at the bottom of everything.

01:25:32.140 | - No, yeah, I disagree with Chomsky completely

01:25:34.860 | on so many levels, from universal grammar to, yeah.

01:25:39.860 | - So that was a paper in Science B

01:25:41.660 | on the recursive cortical network.

01:25:44.020 | What other interesting problems are there,

01:25:47.180 | the open problems in brain-inspired approaches

01:25:51.060 | that you're thinking about?

01:25:52.420 | - I mean, everything is open, right?

01:25:53.740 | Like, no problem is solved, solved, right?

01:25:58.460 | First, I think of perception as kind of

01:26:01.420 | the first thing that you have to build,

01:26:05.620 | but the last thing that you will be actually solved.

01:26:08.620 | Because if you do not build perception system

01:26:13.380 | in the right way,

01:26:14.380 | you cannot build concept system in the right way.

01:26:17.340 | So you have to build a perception system,

01:26:19.620 | however wrong that might be,

01:26:21.180 | you have to still build that and learn concepts from there

01:26:24.060 | and then keep iterating.

01:26:26.460 | And finally, perception will get solved fully

01:26:29.740 | when perception, cognition, language,

01:26:31.500 | all those things work together finally.

01:26:34.020 | - So what, and that, so great,

01:26:37.140 | we've talked a lot about perception,

01:26:38.540 | but then maybe on the concept side and like common sense

01:26:42.340 | or just general reasoning side,

01:26:44.660 | is there some intuition you can draw from the brain

01:26:48.220 | about how we can do that?

01:26:50.780 | - So I have this classic example I give.

01:26:55.540 | So suppose I give you a few sentences

01:26:59.060 | and then ask you a question following that sentence.

01:27:01.260 | This is a natural language processing problem, right?

01:27:04.460 | So here it goes.

01:27:06.100 | I'm telling you, Sally pounded a nail on the ceiling.

01:27:10.660 | Okay, that's a sentence.

01:27:13.340 | Now I'm asking you a question,

01:27:14.700 | was the nail horizontal or vertical?

01:27:16.460 | - Vertical.

01:27:18.540 | - Okay, how did you answer that?

01:27:20.140 | - Well, I imagined Sally,

01:27:24.660 | it was kind of hard to imagine what the hell she was doing,

01:27:26.660 | but I imagined a visual of the whole situation.

01:27:31.660 | - Exactly, exactly.

01:27:34.900 | So here, I posed a question in natural language.

01:27:39.060 | The answer to that question was,

01:27:40.820 | you got the answer from actually simulating the scene.

01:27:45.060 | Now I can go more and more detail about,

01:27:47.100 | okay, was Sally standing on something while doing this?

01:27:50.380 | Could she have been standing on a light bulb to do this?

01:27:54.620 | I could ask more and more questions about this

01:27:57.580 | and I can ask, make you simulate the scene

01:28:00.420 | in more and more detail, right?

01:28:02.380 | Where is all that knowledge that you're accessing stored?

01:28:06.700 | It is not in your language system.

01:28:08.460 | It was not just by reading text you got that knowledge.

01:28:12.740 | It is stored from the everyday experiences

01:28:15.420 | that you have had from,

01:28:17.500 | and by the age of five, you have pretty much all of this,

01:28:21.700 | and it is stored in your visual system.

01:28:24.460 | It is stored in your motor system in a way

01:28:27.140 | such that it can be accessed through language.

01:28:29.820 | - Got it.

01:28:31.660 | I mean, right.

01:28:32.500 | So here, the language is just,

01:28:34.300 | almost serves as the query into the whole visual cortex

01:28:36.780 | and that does the whole feedback thing.

01:28:38.620 | But I mean, is all reasoning kind of connected

01:28:42.660 | to the perception system in some way?

01:28:45.980 | - You can do a lot of it.

01:28:47.460 | You can still do a lot of it by quick associations

01:28:51.540 | without having to go into the depth.

01:28:53.860 | And by the time you will be right, right?

01:28:55.900 | You can just do quick associations,

01:28:57.500 | but I can easily create tricky situations for you

01:29:00.220 | where that quick associations is wrong

01:29:02.340 | and you have to actually run the simulation.

01:29:04.940 | - So the figuring out how these concepts connect,

01:29:08.340 | do I have a good idea of how to do that?

01:29:11.980 | - That's exactly what-

01:29:13.300 | - That's the-

01:29:14.140 | - One of the problems that we are working on.

01:29:15.700 | And the way we are approaching that is basically saying,

01:29:20.100 | okay, you need to,

01:29:21.540 | so the takeaway is that language is simulation control

01:29:26.540 | and your perceptual plus motor system

01:29:31.380 | is building a simulation of the world.

01:29:34.140 | And so that's basically the way we are approaching it.

01:29:37.820 | And the first thing that we built

01:29:39.380 | was a controllable perceptual system.

01:29:42.220 | And we built a schema networks,

01:29:43.860 | which was a controllable dynamic system.

01:29:46.260 | Then we built a concept learning system

01:29:48.660 | that puts all these things together into programs,

01:29:52.260 | as abstractions that you can run and simulate.

01:29:55.420 | And now we are taking the step of connecting it to language.

01:29:58.140 | And it will be very simple examples initially.

01:30:02.660 | It will not be the GPT-3 like examples,

01:30:05.060 | but it will be grounded simulation-based language.

01:30:08.620 | - And for like the querying would be

01:30:12.180 | like question answering kind of thing?

01:30:13.940 | - Correct, correct.

01:30:14.900 | And it will be in some simple world initially

01:30:17.940 | on, you know, but it will be about,

01:30:20.580 | okay, can the system connect the language

01:30:22.940 | and ground it in the right way

01:30:25.300 | and run the right simulations to come up with the answer.

01:30:27.820 | - And the goal is to try to do things that,

01:30:29.700 | for example, GPT-3 couldn't do.

01:30:31.820 | - Correct.

01:30:33.180 | - Speaking of which,

01:30:35.020 | if we could talk about GPT-3 a little bit,

01:30:39.060 | I think it's an interesting thought provoking

01:30:43.740 | set of ideas that OpenAI is pushing forward.

01:30:46.140 | I think it's good for us to talk about the limits

01:30:49.220 | and the possibilities in neural network.

01:30:50.900 | So in general, what are your thoughts

01:30:52.860 | about this recently released very large

01:30:56.700 | 175 billion parameter language model?

01:30:59.980 | - So I haven't directly evaluated it yet.

01:31:03.340 | From what I have seen on Twitter

01:31:05.220 | and other people evaluating it,

01:31:07.020 | it looks very intriguing.

01:31:08.140 | You know, I am very intrigued

01:31:09.540 | by some of the properties it is displaying.

01:31:12.140 | And of course the text generation part of that

01:31:16.380 | was already evident in GPT-2,

01:31:19.260 | you know, that it can generate coherent text

01:31:21.260 | over long distances.

01:31:24.100 | But of course the weaknesses are also pretty visible

01:31:28.860 | in saying that, okay,

01:31:29.820 | it is not really carrying a world state around.

01:31:32.700 | And, you know, sometimes you get sentences like,

01:31:36.180 | I went up the hill to reach the valley or the thing.

01:31:39.380 | You know, some completely incompatible statements.

01:31:43.260 | Or when you're traveling from one place to the other,

01:31:46.260 | it doesn't take into account the time of travel,

01:31:48.020 | things like that.

01:31:48.860 | So those things I think will happen less in GPT-3

01:31:52.780 | because it is trained on even more data.

01:31:55.100 | And so, and it can do even more longer distance coherence.

01:32:00.100 | But it will still have the fundamental limitations

01:32:04.740 | that it doesn't have a world model

01:32:07.780 | and it can't run simulations in its head

01:32:09.780 | to find whether something is true in the world or not.

01:32:13.420 | - Do you think within,

01:32:15.340 | so it's taking a huge amount of text from the internet

01:32:17.900 | and forming a compressed representation.

01:32:20.580 | Do you think in that could emerge

01:32:23.860 | something that's an approximation of a world model,

01:32:27.780 | which essentially could be used for reasoning?

01:32:30.180 | I mean, it's a,

01:32:31.020 | I'm not talking about GPT-3,

01:32:34.260 | I'm talking about GPT-4, 5, and GPT-10.

01:32:37.580 | - Yeah, I mean, they will look more impressive than GPT-3.

01:32:40.780 | So you can, if you take that to the extreme,

01:32:42.980 | then a Markov chain of just first order,

01:32:47.140 | and if you go to,

01:32:49.180 | I'm taking it the other extreme.

01:32:52.020 | If you read Shannon's book, right?

01:32:55.340 | He has a model of English text,

01:32:58.060 | which is based on first order Markov chains,

01:33:00.500 | second order Markov chains, third order Markov chains,

01:33:02.380 | and saying that, okay, third order Markov chains

01:33:04.260 | look better than first order Markov chains, right?

01:33:07.660 | So does that mean a first order Markov chain

01:33:10.540 | has a model of the world?

01:33:12.820 | Yes, it does.

01:33:14.300 | So yes, in that level,

01:33:16.780 | when you go higher order models

01:33:19.260 | or more sophisticated structure in the model,

01:33:22.700 | like the transformer networks have,

01:33:24.260 | yes, they have a model of the text world,

01:33:26.860 | but that is not a model of the world.

01:33:31.980 | It's a model of the text world,

01:33:33.460 | and it will have interesting properties

01:33:37.620 | and it will be useful,

01:33:39.060 | but just scaling it up is not going to give us AGI

01:33:44.060 | or natural language understanding or meaning.

01:33:48.300 | - The question is whether being forced

01:33:53.500 | to compress a very large amount of text

01:33:56.900 | forces you to construct things that are very much like,

01:34:03.180 | 'cause the ideas of concepts and meaning is a spectrum.

01:34:07.180 | So in order to form that kind of compression,

01:34:12.780 | maybe it will be forced to figure out abstractions

01:34:18.900 | which look awfully a lot like the kind of things

01:34:24.340 | that we think about as concepts,

01:34:27.380 | as world models, as common sense.

01:34:29.660 | Is that possible?

01:34:31.180 | - No, I don't think it is possible

01:34:32.540 | because the information is not there.

01:34:34.460 | - The information is there behind the text, right?

01:34:38.740 | - No, unless somebody has written down all the details

01:34:41.740 | about how everything works in the world

01:34:44.580 | to the absurd amounts like,

01:34:46.660 | okay, it is easier to walk forward than backward,

01:34:50.460 | that you have to open the door to go out of the thing,

01:34:53.100 | doctors wear underwear,

01:34:55.380 | unless all these things somebody has written down somewhere

01:34:57.740 | or somehow the program found it to be useful

01:35:00.380 | for compression from some other text,

01:35:02.580 | the information is not there.

01:35:05.380 | - That's an argument that like text is a lot lower fidelity

01:35:09.020 | than the experience of our physical world.

01:35:13.180 | - Correct, correct.

01:35:14.300 | Picture is worth a thousand words,

01:35:15.860 | like that kind of thing.

01:35:17.580 | - Well, in this case, pictures aren't really,

01:35:19.780 | so the richest aspect of the physical world

01:35:23.740 | isn't even just pictures,

01:35:25.180 | it's the interactivity with the world.

01:35:28.460 | - Exactly.

01:35:29.300 | - It's being able to interact.

01:35:33.900 | It's almost like,

01:35:35.020 | it's almost like if you could interact,

01:35:39.620 | so I disagree, well, maybe I agree with you

01:35:42.500 | that pictures worth a thousand words,

01:35:43.940 | but a thousand--

01:35:45.940 | - It's still, yeah, you could say,

01:35:47.300 | you could capture it with a GPT-X.

01:35:49.940 | - So I wonder if there's some interactive element

01:35:52.140 | where a system could live in text world

01:35:54.300 | where it could be part of the chat,

01:35:57.220 | be part of talking to people.

01:36:00.460 | It's interesting, I mean, fundamentally,

01:36:03.140 | so you're making a statement about the limitation of text.

01:36:07.620 | Okay, so let's say we have a text corpus

01:36:11.500 | that includes basically every experience

01:36:16.500 | we could possibly have.

01:36:18.220 | I mean, just a very large corpus of text

01:36:20.260 | and also interactive components.

01:36:23.300 | I guess the question is whether

01:36:24.660 | the neural network architecture,

01:36:26.220 | these very simple transformers,

01:36:28.700 | but if they had like hundreds of trillions

01:36:32.380 | or whatever comes after trillion parameters,

01:36:36.420 | whether that could store the information needed,

01:36:41.420 | that's architecturally.

01:36:43.940 | Do you have thoughts about the limitation

01:36:46.180 | on that side of things with neural networks?

01:36:49.300 | - I mean, so transformer is still

01:36:51.900 | a feed-forward neural network.

01:36:54.420 | It has a very interesting architecture

01:36:57.140 | which is good for text modeling

01:36:59.060 | and probably some aspects of video modeling,

01:37:01.660 | but it is still a feed-forward architecture.

01:37:03.980 | - You believe in the feedback mechanism, recursion.

01:37:07.020 | - Oh, and also causality,

01:37:10.460 | being able to do counterfactual reasoning,

01:37:12.860 | being able to do interventions,

01:37:15.020 | which is actions in the world.

01:37:18.900 | So all those things require different kinds

01:37:22.020 | of models to be built.

01:37:24.180 | I don't think transformers captures that family.

01:37:28.900 | It is very good at statistical modeling of text

01:37:32.420 | and it will become better and better with more data,

01:37:36.540 | bigger models, but that is only going to get so far.

01:37:40.740 | Finally, when you, so I had this joke on Twitter

01:37:44.780 | saying that, "Hey, this is a model that has read

01:37:47.900 | "all of quantum mechanics and theory of relativity

01:37:52.100 | "and we are asking it to do text completion

01:37:54.260 | "or we are asking it to solve simple puzzles."

01:37:57.420 | That's, when you have AGI,

01:38:00.980 | that's not what you ask the system to do.

01:38:02.460 | If it does, we'll ask the system to do experiments,

01:38:05.660 | what should, and come up with hypothesis

01:38:09.380 | and revise the hypothesis based on evidence

01:38:11.820 | from experiments, all those things, right?

01:38:13.780 | Those are the things that we want the system to do

01:38:15.460 | when we have AGI, not solve simple puzzles.

01:38:20.100 | - Like impressive demo, somebody generating a red button

01:38:23.340 | in HTML.

01:38:24.420 | - Which are all useful,

01:38:27.220 | like there's no dissing the usefulness of it.

01:38:30.060 | - So I get, by the way, I'm playing a little bit

01:38:32.620 | of a devil's advocate, so calm down internet.

01:38:36.460 | So I just, I'm curious almost,

01:38:40.920 | in which ways will a dumb,

01:38:44.700 | but large neural network will surprise us?

01:38:48.260 | - Yeah.

01:38:49.100 | I'm, it's kind of your,

01:38:51.020 | I completely agree with your intuition,

01:38:53.480 | it's just that I don't want to dogmatically,

01:38:57.540 | like 100% put all the chips there.

01:39:01.420 | We've been surprised so much,

01:39:03.580 | even the current GPT-2 and 3 are so surprising.

01:39:08.320 | - Yeah.

01:39:09.900 | - The self-play mechanisms of alpha zero

01:39:13.300 | are really surprising.

01:39:15.140 | And I, reinforcement, the fact that reinforcement learning

01:39:19.860 | works at all to me is really surprising.

01:39:22.060 | The fact that neural networks work at all

01:39:23.740 | is quite surprising, given how nonlinear the space is,

01:39:27.820 | the fact that it's able to find local minima

01:39:30.860 | that are at all reasonable, it's very surprising.

01:39:33.500 | So it's, I wonder sometimes

01:39:37.100 | whether us humans just want it to not,

01:39:44.300 | for AGI not to be such a dumb thing.

01:39:46.960 | (laughing)

01:39:48.020 | So I just,

01:39:48.860 | 'cause exactly what you're saying is like,

01:39:52.820 | the ideas of concepts and be able to reason

01:39:54.900 | with those concepts and connect those concepts

01:39:57.620 | in like hierarchical ways,

01:40:00.260 | and then to be able to have world models.

01:40:03.500 | I mean, just everything we're describing

01:40:05.580 | in human language in this poetic way seems to make sense,

01:40:09.460 | that that is what intelligence and reasoning are like.

01:40:12.140 | I wonder if at the core of it, it could be much dumber.

01:40:15.580 | - Well, finally it is still connections

01:40:18.500 | and messages passing over, right?

01:40:20.220 | - Right.

01:40:21.060 | - So in that way it's dumb.

01:40:22.220 | (laughing)

01:40:23.780 | - So I guess the recursion, the feedback mechanism,

01:40:27.660 | that does seem to be a fundamental kind of thing.

01:40:30.200 | Yeah, yeah.

01:40:32.820 | The idea of concepts, also memory.

01:40:36.140 | - Correct.

01:40:36.960 | Yeah, having an episodic memory.

01:40:38.620 | - Yeah.

01:40:39.460 | That seems to be an important thing.

01:40:41.460 | - So how do we get memory?

01:40:43.140 | - So yeah, we have another piece of work

01:40:45.180 | which came out recently on how do you form episodic memories

01:40:49.820 | and form abstractions from them?

01:40:52.260 | And we haven't figured out all the connections of that

01:40:55.980 | to the overall cognitive architecture, but-

01:40:58.340 | - Well, yeah, what are your ideas

01:41:00.340 | about how you could have episodic memory?

01:41:03.100 | - So at least it's very clear that you need

01:41:06.260 | to have two kinds of memory, right?

01:41:07.660 | That's very, very clear, right?

01:41:09.660 | There are things that happen

01:41:12.340 | as statistical patterns in the world,

01:41:16.260 | but then there is the one timeline of things

01:41:19.580 | that happen only once in your life, right?

01:41:22.100 | And this day is not going to happen ever again.

01:41:24.620 | And that needs to be stored as just a stream of strings,

01:41:29.620 | right?

01:41:32.180 | This is my experience.

01:41:33.140 | And then the question is about how do you take

01:41:37.420 | that experience and connect it to the statistical part

01:41:39.820 | of it?

01:41:40.660 | How do you now say that, okay, I experienced this thing.

01:41:43.540 | Now I want to be careful about similar situations.

01:41:47.460 | And so you need to be able to index that similarity

01:41:53.140 | using your other giant statistics,

01:41:56.420 | the model of the world that you have learned.

01:41:59.180 | Although the situation came from the episode,

01:42:01.420 | you need to be able to index the other one.

01:42:03.340 | So the episodic memory being implemented

01:42:08.340 | as an indexing over the other model that you're building.

01:42:14.220 | - So the memories remain and they're an index

01:42:20.260 | into this, like the statistical thing that you formed.

01:42:24.820 | - Yeah, statistical causal structural model

01:42:26.780 | that you've built over time.

01:42:28.900 | So it's basically the idea is that the hippocampus

01:42:32.940 | is just storing or sequencing in a set of pointers

01:42:37.940 | that happens over time.

01:42:42.380 | And then whenever you want to reconstitute that memory

01:42:45.060 | and evaluate the different aspects of it,

01:42:49.180 | whether it was good, bad,

01:42:50.420 | do I need to encounter the situation again?

01:42:52.620 | You need the cortex to re-instantiate,

01:42:56.540 | to replay that memory.

01:42:57.660 | - So how do you find that memory?

01:43:00.500 | Which direction is the important direction?

01:43:03.300 | - Both directions are again bidirectional.

01:43:06.060 | - I guess, how do you retrieve the memory?

01:43:09.860 | - So this is again hypothesis, right?

01:43:11.580 | We're making this up.

01:43:12.420 | So when you come to a new situation,

01:43:15.900 | your cortex is doing inference over in the new situation.

01:43:21.340 | And then of course, hippocampus is connected

01:43:23.940 | to different parts of the cortex.

01:43:25.980 | And you have this deja vu situation, right?

01:43:29.700 | Okay, I have seen this thing before.

01:43:32.060 | And then in the hippocampus,

01:43:35.420 | you can have an index of,

01:43:36.940 | okay, this is when it happened as a timeline.

01:43:40.020 | Then you can use the hippocampus

01:43:44.540 | to drive the similar timelines to say,

01:43:47.780 | now I am rather than being driven

01:43:50.460 | by my current input stimuli,

01:43:53.060 | I am going back in time

01:43:54.620 | and rewinding my experience from there.

01:43:56.820 | - Replaying it.

01:43:57.660 | - But putting back into the cortex.

01:43:58.940 | And then putting it back into the cortex,

01:44:00.980 | of course affects what you're going to see next

01:44:03.820 | in your current situation.

01:44:05.100 | - Got it, yeah.

01:44:05.980 | So that's the whole thing,

01:44:07.540 | having a world model and then, yeah.

01:44:10.340 | Connecting to the perception.

01:44:11.820 | Yeah, it does seem to be that that's what's happening.

01:44:14.420 | It'd be, on the neural network side,

01:44:17.660 | it's interesting to think of how we actually do that.

01:44:21.700 | - Yeah, yeah.

01:44:23.380 | - To have a knowledge base.

01:44:24.900 | - Yes, it is possible that you can put many

01:44:27.340 | of these structures into neural networks

01:44:30.420 | and we will find ways of combining properties

01:44:34.740 | of neural networks and graphical models.

01:44:38.420 | So, I mean, it's already started happening.

01:44:41.580 | Graph neural networks are kind of a merge between them.

01:44:44.620 | And there will be more of that.

01:44:46.660 | So, but to me, the direction is pretty clear.

01:44:50.740 | I mean, looking at biology and the history

01:44:53.660 | of evolutionary history of intelligence,

01:44:57.820 | it is pretty clear that, okay,

01:45:00.420 | what we need is more structure in the models

01:45:03.580 | and modeling of the world

01:45:05.660 | and supporting dynamic inference.

01:45:08.100 | - Well, let me ask you, there's a guy named Elon Musk.

01:45:12.820 | There's a company called Neuralink

01:45:14.540 | and there's a general field called

01:45:15.980 | Brain-Computer Interfaces.

01:45:18.300 | It's kind of an interface between your two loves.

01:45:23.140 | - Yes.

01:45:23.980 | - The brain and the intelligence.

01:45:25.700 | So, there's like very direct applications

01:45:28.220 | of Brain-Computer Interfaces for people

01:45:31.300 | with different conditions, more in the short term.

01:45:33.620 | But there's also these sci-fi futuristic kinds of ideas

01:45:36.620 | of AI systems being able to communicate

01:45:41.620 | in a high bandwidth way with the brain, bi-directional.

01:45:46.540 | What are your thoughts about Neuralink

01:45:49.300 | and BCI in general as a possibility?

01:45:52.500 | - So, I think BCI is a cool research area.

01:45:56.540 | And in fact, when I got interested in brains initially,

01:46:01.540 | when I was enrolled at Stanford

01:46:03.260 | and when I got interested in brains,

01:46:04.700 | it was through a brain-computer interface talk

01:46:09.060 | that Krishna Shenoy gave.

01:46:10.540 | That's when I even started thinking about the problem.

01:46:13.140 | So, it is definitely a fascinating research area

01:46:16.500 | and the applications are enormous, right?

01:46:19.740 | So, there's the science fiction scenario

01:46:22.380 | of brains directly communicating.

01:46:24.460 | Let's keep that aside for the time being.

01:46:27.700 | Even just the intermediate milestones they're pursuing,

01:46:30.940 | which are very reasonable as far as I can see,

01:46:33.980 | being able to control an external limb

01:46:36.420 | using direct connections from the brain

01:46:40.700 | and being able to write things into the brain.

01:46:43.740 | So, those are all good steps to take

01:46:47.940 | and they have enormous applications.

01:46:50.620 | People losing limbs being able to control prosthetics,

01:46:54.620 | quadriplegics being able to control something,

01:46:57.340 | so, and therapeutics.

01:46:59.180 | And I also know about another company

01:47:01.220 | working in this space called Paradromics.

01:47:03.820 | They're based on a different electrode array,

01:47:08.260 | but trying to attack some of the same problems.

01:47:10.260 | So, I think it's a very-

01:47:11.980 | - Also surgery?

01:47:13.780 | - Correct, surgically implanted electrodes, yeah.

01:47:16.580 | So, yeah, I think of it as a very, very promising field,

01:47:21.460 | especially when it is helping people

01:47:23.780 | overcome some limitations.

01:47:25.580 | Now, at some point, of course,

01:47:27.660 | it will advance the level of being able to communicate.

01:47:31.460 | - How hard is that problem, do you think?

01:47:33.380 | Like, so, okay, let's say we magically solve

01:47:37.580 | what I think is a really hard problem

01:47:39.980 | of doing all of this safely.

01:47:42.260 | - Yeah.

01:47:43.300 | - So, like, being able to connect electrodes,

01:47:46.940 | and not just thousands, but like millions to the brain.

01:47:50.660 | - I think it's very, very hard

01:47:52.100 | because you also do not know

01:47:54.620 | what will happen to the brain with that, right?

01:47:57.660 | In the sense of how does the brain adapt

01:47:59.260 | to something like that?

01:48:00.100 | - And it's, you know, as we were learning,

01:48:02.220 | the brain is quite, in terms of neuroplasticity,

01:48:06.220 | is pretty malleable.

01:48:07.460 | - Correct.

01:48:08.300 | - So, it's gonna adjust.

01:48:09.260 | - Correct.

01:48:10.300 | - So, the machine learning side,

01:48:11.620 | the computer side is gonna adjust,

01:48:13.260 | and then the brain's gonna adjust.

01:48:14.660 | - Exactly, and then what soup does this land us into?

01:48:17.460 | - The kind of hallucinations you might get from this

01:48:21.900 | that might be pretty intense.

01:48:23.180 | - Yeah, yeah.

01:48:24.620 | - Just connecting to all of Wikipedia.

01:48:27.100 | It's interesting whether we need to be able to figure out

01:48:30.780 | the basic protocol of the brain's communication schemes

01:48:35.140 | in order to get them to, the machine and the brain to talk.

01:48:38.940 | 'Cause another possibility is the brain

01:48:41.300 | actually just adjusts to whatever the heck

01:48:42.980 | the computer is doing.

01:48:43.820 | - Exactly, that's the way I think,

01:48:45.740 | I find that to be a more promising way.

01:48:48.140 | It's basically saying, you know,

01:48:50.380 | okay, attach electrodes to some part of the cortex, okay?

01:48:53.340 | And maybe if it is done from birth,

01:48:56.980 | the brain will adapt it such that, you know,

01:48:59.060 | that part is not damaged, it was not used for anything.

01:49:01.580 | These electrodes are attached there, right?

01:49:03.020 | And now, you train that part of the brain

01:49:06.780 | to do this high bandwidth communication

01:49:08.580 | between something else, right?

01:49:10.540 | And if you do it like that, then it is brain adapting to,

01:49:15.260 | and of course, your external system is designed

01:49:17.340 | such that it is adaptable.

01:49:18.580 | Just like we design computers or mouse, keyboard,

01:49:22.420 | all of them to be interacting with humans.

01:49:26.860 | So of course, that feedback system is designed

01:49:29.540 | to be human compatible, but now it is not trying to record

01:49:34.540 | from all of the brain and now, you know,

01:49:39.780 | two system trying to adapt to each other.

01:49:41.660 | It's the brain adapting into one way.

01:49:44.340 | - That's fascinating.

01:49:45.660 | The brain is connected to like the internet.

01:49:48.140 | It's connected.

01:49:49.140 | - Yeah.

01:49:50.340 | - Just imagine, it's connecting it to Twitter

01:49:52.260 | and just taking that stream of information.

01:49:56.500 | Yeah, but again, if we take a step back,

01:50:00.620 | I don't know what your intuition is.

01:50:02.460 | I feel like that is not as hard of a problem

01:50:06.700 | as doing it safely.

01:50:10.140 | There's a huge barrier to surgery.

01:50:14.980 | - Right.

01:50:15.820 | - 'Cause the biological system,

01:50:17.620 | it's a mush of like weird stuff.

01:50:20.900 | - Correct.

01:50:21.740 | So that, the surgery part of it, biology part of it,

01:50:25.100 | the long-term repercussions part of it,

01:50:27.820 | again, I don't know what else will,

01:50:29.620 | we often find after a long time in biology that,

01:50:35.580 | okay, that idea was wrong, right?

01:50:37.620 | So people used to cut off this,

01:50:40.260 | the gland called the thymus or something.

01:50:43.780 | And then they found that,

01:50:45.820 | oh no, that actually causes cancer.

01:50:48.300 | (both laughing)

01:50:50.660 | - And then there's a subtle,

01:50:51.980 | like millions of variables involved.

01:50:54.140 | But this whole process, the nice thing,

01:50:56.660 | just like, again, with Elon, just like colonizing Mars,

01:51:00.980 | seems like a ridiculously difficult idea.

01:51:03.500 | But in the process of doing it,

01:51:05.500 | we might learn a lot about the biology,

01:51:08.420 | the neurobiology of the brain,

01:51:10.380 | the neuroscience side of things.

01:51:11.900 | It's like, if you wanna learn something,

01:51:14.380 | do the most difficult version of it.

01:51:16.780 | - Yeah.

01:51:17.620 | - And see what you learn.

01:51:18.540 | - The intermediate steps that they are taking

01:51:20.780 | sounded all very reasonable to me.

01:51:22.500 | - Yeah, it's great.

01:51:24.180 | Well, but like everything with Elon

01:51:26.260 | is the timeline seems insanely fast, so.

01:51:28.940 | - Right.

01:51:29.780 | - That's the only awful question.

01:51:33.220 | - Well, we've been talking about cognition a little bit,

01:51:36.220 | so like reasoning.

01:51:37.380 | We haven't mentioned the other C word,

01:51:40.460 | which is consciousness.

01:51:42.380 | Do you ever think about that one?

01:51:43.940 | Is that useful at all in this whole context

01:51:47.340 | of what it takes to create an intelligent reasoning being?

01:51:52.340 | Or is that completely outside of your,

01:51:55.780 | like the engineering perspective of intelligence?

01:51:58.540 | - It is not outside the realm,

01:52:00.180 | but it doesn't on a day-to-day basis inform what we do,

01:52:05.180 | but it's more, so in many ways,

01:52:08.180 | the company name is connected to this idea of consciousness.

01:52:12.820 | - What's the company name?

01:52:13.860 | - Vicarious, so Vicarious is the company name.

01:52:16.620 | So what does Vicarious mean?

01:52:20.180 | At the first level, it is about modeling the world,

01:52:25.100 | and it is internalizing the external actions.

01:52:29.460 | So you interact with the world

01:52:31.500 | and learn a lot about the world.

01:52:33.060 | And now, after having learned a lot about the world,

01:52:36.500 | you can run those things in your mind

01:52:40.180 | without actually having to act in the world.

01:52:42.820 | So you can run things vicariously, just in your brain.

01:52:47.180 | And similarly, you can experience another person's thoughts

01:52:51.460 | by having a model of how that person works

01:52:54.660 | and running there, putting yourself

01:52:58.020 | in some other person's shoes.

01:52:59.740 | So that is being vicarious.

01:53:01.380 | Now, it's the same modeling apparatus

01:53:04.580 | that you're using to model the external world

01:53:06.940 | or some other person's thoughts.

01:53:08.940 | You can turn it to yourself.

01:53:11.260 | You can, if that same modeling thing

01:53:13.900 | is applied to your own modeling apparatus,

01:53:17.620 | then that is what gives rise to consciousness, I think.

01:53:21.140 | - Well, that's more like self-awareness.

01:53:23.660 | There's the hard problem of consciousness,

01:53:25.540 | which is when the model feels like something,

01:53:30.540 | when this whole process is like,

01:53:34.760 | it's like you really are in it.

01:53:39.180 | You feel like an entity in this world.

01:53:41.860 | Not just you know that you're an entity,

01:53:44.100 | but it feels like something to be that entity.

01:53:50.580 | And thereby, we attribute this,

01:53:53.220 | then it starts to be where in something

01:53:56.260 | that has consciousness can suffer.

01:53:58.500 | You start to have these kinds of things

01:54:00.180 | that we can reason about that.

01:54:02.340 | - Yes.

01:54:03.180 | - It's much heavier.

01:54:06.460 | It seems like there's much greater cost of your decisions.

01:54:11.020 | And mortality is tied up into that.

01:54:14.540 | The fact that these things end.

01:54:16.940 | - Right.

01:54:18.260 | - First of all, I end at some point,

01:54:20.340 | and then other things end.

01:54:22.460 | And that somehow seems to be,

01:54:25.940 | at least for us humans, a deep motivator.

01:54:30.260 | - Yes.

01:54:31.380 | - And that idea of motivation in general,

01:54:34.300 | we talk about goals in AI,

01:54:35.820 | but goals aren't quite the same thing as our mortality.

01:54:40.820 | It feels like first of all, humans don't have a goal.

01:54:47.540 | And they just kind of create goals at different levels.

01:54:50.540 | They like make up goals,

01:54:52.980 | because we're terrified by the mystery

01:54:56.180 | of the thing that gets us all.

01:54:59.020 | So we make these goals up.

01:55:02.020 | So we're like a goal generation machine,

01:55:05.180 | as opposed to a machine which optimizes

01:55:07.980 | the trajectory towards a singular goal.

01:55:10.860 | So it feels like that's an important part of cognition,

01:55:15.700 | that whole mortality thing.

01:55:17.500 | - Well, it is a part of human cognition,

01:55:21.500 | but there is no reason for that mortality

01:55:26.500 | to come to the equation for an artificial system,

01:55:31.500 | because we can copy the artificial system.

01:55:35.540 | The problem with humans is that I can't clone you.

01:55:38.300 | Even if I clone you as the hardware,

01:55:44.220 | your experience that was stored in your brain,

01:55:47.380 | or your episodic memory,

01:55:49.660 | all those will not be captured in the new clone.

01:55:52.060 | But that's not the same with an AI system.

01:55:56.100 | - But it's also possible that the thing

01:56:01.020 | that you mentioned with us humans

01:56:03.820 | is actually of fundamental importance for intelligence.

01:56:06.940 | So the fact that you can copy an AI system

01:56:10.380 | means that that AI system is not yet an AGI.

01:56:16.140 | - So if you look at existence proof,

01:56:18.780 | if we reason based on existence proof,

01:56:21.860 | you could say that it doesn't feel like death

01:56:23.980 | is a fundamental property of an intelligent system,

01:56:27.300 | but we don't yet, give me an example

01:56:31.900 | of an immortal intelligent being, we don't have those.

01:56:36.500 | It's very possible that that is a fundamental property

01:56:42.540 | of intelligence is a thing that has a deadline for itself.

01:56:47.540 | (both laughing)

01:56:48.420 | - You can think of it like this.

01:56:49.860 | Suppose you invent a way to freeze people for a long time.

01:56:54.500 | It's not dying, right?

01:56:56.460 | So you can be frozen and woken up

01:56:59.780 | thousands of years from now.

01:57:01.940 | So it's no fear of death.

01:57:03.300 | - Well, no, you're still, it's not about time.

01:57:08.980 | It's about the knowledge that it's temporary.

01:57:12.560 | And that aspect of it, the finiteness of it,

01:57:17.560 | I think creates a kind of urgency.

01:57:21.680 | - Correct, for us, for humans.

01:57:23.640 | - Yeah, for humans.

01:57:24.480 | - Yes, and that is part of our drives.

01:57:28.040 | But, and that's why I'm not too worried about AI,

01:57:32.560 | having motivations to kill all humans

01:57:38.040 | and those kinds of things.

01:57:39.400 | Why, just wait.

01:57:40.800 | (both laughing)

01:57:43.460 | Why do you need to do that?

01:57:45.240 | - Yeah, I've never heard that before.

01:57:46.640 | That's a good point.

01:57:48.480 | Yeah, just murder seems like a lot of work.

01:57:53.040 | Just wait it out.

01:57:54.260 | They'll probably hurt themselves.

01:57:57.360 | Let me ask you, people often kind of wonder,

01:58:01.520 | world-class researchers such as yourself,

01:58:05.520 | what kind of books, technical, fiction, philosophical,

01:58:09.920 | were, had an impact on you in your life

01:58:14.040 | and maybe ones you could possibly recommend

01:58:19.040 | that others read, maybe if you have three books

01:58:21.600 | that pop into mind.

01:58:23.280 | - Yeah, so I definitely liked Judea Pearl's book,

01:58:27.280 | Probabilistic Reasoning and Intelligent Systems.

01:58:29.840 | It's a very deep technical book,

01:58:32.840 | but what I liked is that,

01:58:34.680 | so there are many places where you can learn

01:58:37.080 | about probabilistic graphical models from.

01:58:39.720 | But throughout this book, Judea Pearl kind of sprinkles

01:58:43.160 | his philosophical observations and he thinks about,

01:58:46.880 | connects us to how the brain thinks and attentions

01:58:50.040 | and resources, all those things.

01:58:51.520 | So that whole thing makes it more interesting to read.

01:58:55.280 | - He emphasizes the importance of causality.

01:58:57.960 | - So that was in his later book.

01:58:59.520 | So this was the first book,

01:59:00.840 | Probabilistic Reasoning and Intelligent Systems.

01:59:02.560 | He mentions causality, but he hadn't really

01:59:05.760 | sunk his teeth into, like, you know,

01:59:07.760 | how do you actually formalize--

01:59:09.200 | - Got it.

01:59:10.040 | - Yeah, and the second book, Causality,

01:59:12.520 | it was the one in 2000, that one is really hard.

01:59:16.240 | So I wouldn't recommend that.

01:59:17.840 | - Oh yeah, so that looks at the, like,

01:59:19.320 | the mathematical, like, his model of--

01:59:22.680 | - Due calculus.

01:59:23.520 | - Due calculus, yeah, it was pretty dense mathematically.

01:59:25.680 | - Right, right, right.

01:59:27.120 | The book of Y is definitely more enjoyable.

01:59:29.040 | - Oh, for sure.

01:59:29.880 | - Yeah, so yeah, so I would recommend

01:59:32.000 | Probabilistic Reasoning and Intelligent Systems.

01:59:34.400 | Another book I liked was one from Doug Hofstadter.

01:59:39.320 | This was a long time ago.

01:59:40.160 | He has a book, he had a book, I think,

01:59:41.800 | called, it was called The Mind's Eye.

01:59:43.600 | It was probably Hofstadter and Daniel Dennett together.

01:59:48.600 | - Yeah, and I actually was, I bought that book.

01:59:53.360 | It's on my shelf, I haven't read it yet.

01:59:55.040 | But I couldn't get an electronic version of it,

01:59:58.440 | which is annoying, 'cause you read everything on Kindle.

02:00:01.840 | - Oh, okay.

02:00:02.680 | - So you have to actually purchase the physical.

02:00:05.080 | It's like one of the only physical books I have,

02:00:07.200 | 'cause anyway, a lot of people recommended it highly, so.

02:00:10.880 | - Yeah, and the third one I would definitely

02:00:13.800 | recommend reading is, this is not a technical book.

02:00:17.800 | It is history.

02:00:19.520 | It's called, the name of the book, I think,

02:00:22.320 | is Bishop's Boys.

02:00:23.480 | It's about Wright brothers and their path

02:00:28.520 | and how it was, there are multiple books on this topic

02:00:33.520 | and all of them are great.

02:00:35.640 | It's fascinating how flight was treated

02:00:40.640 | as an unsolvable problem.

02:00:44.680 | And also, what aspects did people emphasize?

02:00:49.040 | People thought, oh, it is all about just powerful engines.

02:00:54.480 | Just need to have powerful, lightweight engines.

02:00:58.000 | And so, some people thought of it as,

02:01:01.960 | how far can we just throw the thing?

02:01:04.520 | Just throw it.

02:01:05.360 | - Like a catapult.

02:01:06.360 | - Yeah, so it's a very fascinating,

02:01:09.600 | and even after they made the invention,

02:01:13.200 | people not believing it.

02:01:14.400 | - The social aspect of it, yeah.

02:01:17.040 | - The social aspect, it's very fascinating.

02:01:20.160 | - Do you draw any parallels between birds fly?

02:01:25.400 | So there's the natural approach to flight

02:01:28.440 | and then there's the engineered approach.

02:01:30.600 | Do you see the same kind of thing with the brain

02:01:34.080 | and our trying to engineer intelligence?

02:01:36.440 | - Yeah, it's a good analogy to have.

02:01:40.600 | Of course, all analogies have their--

02:01:43.720 | - Limits, for sure.

02:01:46.040 | - So people in AI often use airplanes as an example of,

02:01:51.040 | hey, we didn't learn anything from birds.

02:01:54.200 | Look, but the funny thing is that,

02:01:57.520 | and the saying is, airplanes don't flap wings.

02:02:01.760 | This is what they say.

02:02:03.280 | The funny thing and the ironic thing is that,

02:02:05.680 | that you don't need to flap to fly

02:02:09.000 | is something Wright Brothers found by observing birds.

02:02:12.240 | (both laughing)

02:02:14.920 | In their notebook, in some of these books,

02:02:18.800 | they show their notebook drawings,

02:02:20.840 | they make detailed notes about buzzards

02:02:24.120 | just soaring over thermals.

02:02:27.040 | And they basically say, look,

02:02:28.800 | flapping is not the important,

02:02:30.600 | propulsion is not the important problem to solve here.

02:02:33.320 | We want to solve control.

02:02:35.520 | And once you solve control,

02:02:37.320 | propulsion will fall into place.

02:02:38.880 | All of these are people,

02:02:40.840 | they realize this by observing birds.

02:02:43.040 | (both laughing)

02:02:44.560 | - Beautifully put.

02:02:46.120 | That's actually brilliant.

02:02:47.600 | 'Cause people do use that analogy a lot.

02:02:49.360 | I'm gonna have to remember that one.

02:02:51.400 | Do you have advice for people

02:02:53.640 | interested in artificial intelligence,

02:02:55.160 | like young folks today?

02:02:56.240 | I talk to undergraduate students all the time.

02:02:59.360 | Interested in neuroscience,

02:03:00.560 | interested in understanding how the brain works.

02:03:03.720 | Is there advice you would give them about their career,

02:03:06.880 | maybe about their life in general?

02:03:09.600 | - Sure.

02:03:10.440 | I think every piece of advice

02:03:12.520 | should be taken with a pinch of salt, of course.

02:03:14.880 | Because each person is different,

02:03:17.240 | their motivations are different.

02:03:18.520 | But I can definitely say,

02:03:21.360 | if your goal is to understand the brain

02:03:24.520 | from the angle of wanting to build one,

02:03:26.880 | then being an experimental neuroscientist

02:03:32.760 | might not be the way to go about it.

02:03:34.600 | A better way to pursue it might be through computer science,

02:03:41.960 | electrical engineering, machine learning, and AI.

02:03:44.360 | And of course, you have to study up the neuroscience,

02:03:46.360 | but that you can do on your own.

02:03:48.480 | If you are more attracted by finding something intriguing,

02:03:54.080 | discovering something intriguing about the brain,

02:03:56.600 | then of course it is better to be an experimentalist.

02:04:00.320 | So find that motivation, what are you intrigued by?

02:04:02.480 | And of course, find your strengths too.

02:04:04.320 | Some people are very good experimentalists,

02:04:07.680 | and they enjoy doing that.

02:04:10.120 | - And it's interesting to see which department,

02:04:13.640 | if you're picking in terms of your education path,

02:04:17.520 | whether to go with, at MIT it's brain and computer,

02:04:22.520 | no, it'd be CS.

02:04:26.960 | - Yeah.

02:04:27.920 | - Brain and cognitive sciences, yeah.

02:04:30.240 | Or the CS side of things.

02:04:32.960 | And actually, the brain folks, the neuroscience folks

02:04:36.400 | are more and more now embracing of learning TensorFlow,

02:04:41.600 | PyTorch, right?

02:04:43.080 | They see the power of trying to engineer ideas

02:04:48.080 | that they get from the brain into,

02:04:52.600 | and then explore how those could be used

02:04:54.560 | to create intelligent systems.

02:04:56.640 | So that might be the right department actually.

02:04:59.160 | - Yeah.

02:05:00.600 | So this was a question in one of the

02:05:03.240 | Redwood Neuroscience Institute workshops

02:05:05.920 | that Jeff Hawkins organized almost 10 years ago.

02:05:09.960 | This question was put to a panel, right?

02:05:11.600 | What should be the undergrad major

02:05:13.920 | you should take if you want to understand the brain?

02:05:16.200 | And the majority opinion in that one

02:05:19.040 | was electrical engineering.

02:05:21.040 | - Interesting.

02:05:23.680 | - Because, I mean, I'm a W undergrad,

02:05:25.840 | so I got lucky in that way.

02:05:27.240 | But I think it does have some of the right ingredients,

02:05:31.320 | because you learn about circuits.

02:05:33.400 | You learn about how you can construct circuits

02:05:36.960 | to approach, do functions.

02:05:40.200 | You learn about microprocessors.

02:05:42.640 | You learn information theory.

02:05:43.760 | You learn signal processing.

02:05:45.520 | You learn continuous math.

02:05:46.760 | So in that way, it's a good step

02:05:50.960 | to if you want to go to computer science or neuroscience,

02:05:54.000 | you could, it's a good step.

02:05:55.080 | - The downside, you're more likely to be forced to use MATLAB.

02:06:00.080 | (laughing)

02:06:05.000 | - One of the interesting things about,

02:06:07.920 | I mean, this is changing.

02:06:08.880 | The world is changing.

02:06:09.880 | But certain departments lagged

02:06:13.120 | on the programming side of things,

02:06:15.240 | on developing good habits in software engineering.

02:06:18.440 | But I think that's more and more changing.

02:06:20.840 | And students can take that into their own hands,

02:06:24.700 | like learn to program.

02:06:26.000 | I feel like everybody should learn to program,

02:06:30.680 | because it, like everyone in the sciences,

02:06:34.560 | 'cause it empowers, it puts the data at your fingertips.

02:06:37.840 | So you can organize it.

02:06:38.840 | You can find all kinds of things in the data.

02:06:41.440 | And then you can also, for the appropriate sciences,

02:06:44.440 | build systems that, like based on that.

02:06:48.000 | So like then engineer intelligence systems.

02:06:50.160 | We already talked about mortality.

02:06:53.160 | So we hit a ridiculous point.

02:06:57.480 | But let me ask you the,

02:07:04.520 | one of the things about intelligence is it's goal-driven.

02:07:09.520 | And you study the brain.

02:07:12.800 | So the question is like,

02:07:13.680 | what's the goal that the brain is operating under?

02:07:15.960 | What's the meaning of it all for us humans, in your view?

02:07:19.040 | What's the meaning of life?

02:07:21.200 | (laughing)

02:07:22.560 | - The meaning of life is whatever you construct out of it.

02:07:25.000 | It's completely open.

02:07:26.440 | - It's open?

02:07:27.280 | - Yeah.

02:07:28.120 | - So there's nothing, like you mentioned,

02:07:31.920 | you like constraints.

02:07:32.840 | So there's, what's, it's wide open.

02:07:36.800 | Is there some useful aspect that you think about

02:07:41.720 | in terms of like the openness of it

02:07:45.240 | and just the basic mechanisms of generating goals

02:07:48.240 | in studying cognition in the brain that you think about?

02:07:54.400 | Or is it just about, 'cause everything we've talked about,

02:07:57.080 | kind of the perception system,

02:07:58.400 | is to understand the environment.

02:07:59.880 | That's like to be able to like not die.

02:08:02.480 | - Correct, exactly.

02:08:04.120 | - Like not fall over and like be able to,

02:08:06.640 | you don't think we need to think about

02:08:11.240 | anything bigger than that?

02:08:13.200 | - Yeah, I think so.

02:08:14.040 | Because it's basically being able to understand

02:08:18.000 | the machinery of the world,

02:08:19.360 | such that you can pursue whatever goals you want, right?

02:08:23.080 | - So the machinery of the world is really ultimately

02:08:26.080 | what we should be striving to understand.

02:08:27.920 | The rest is just whatever the heck you wanna do,

02:08:31.960 | or whatever fun you--

02:08:32.800 | - World is culturally popular, you know?

02:08:34.640 | (laughing)

02:08:37.440 | - I think that's beautifully put.

02:08:41.120 | I don't think there's a better way to end it.

02:08:45.240 | I'm so honored that you would show up here

02:08:48.560 | and waste your time with me.

02:08:49.880 | It's been an awesome conversation.

02:08:51.160 | Thanks so much for talking today.

02:08:52.440 | - Oh, thank you so much.

02:08:53.520 | This was so much more fun than I expected.

02:08:56.320 | (laughing)

02:08:57.640 | Thank you.

02:08:58.480 | - Thanks for listening to this conversation

02:09:00.920 | with Delete George.

02:09:02.000 | And thank you to our sponsors,

02:09:04.080 | Babbel, Raycon Earbuds, and Masterclass.

02:09:07.760 | Please consider supporting this podcast

02:09:09.760 | by going to babbel.com and use code Lex,

02:09:13.240 | going to buyraycon.com/lex,

02:09:16.160 | and signing up at masterclass.com/lex.

02:09:19.760 | Click the links, get the discount.

02:09:21.560 | It really is the best way to support this podcast.

02:09:24.280 | If you enjoy this thing, subscribe on YouTube,

02:09:26.840 | review the Five Stars on Apple Podcast,

02:09:29.080 | support it on Patreon, or connect with me on Twitter,

02:09:32.360 | @LexFriedman, spelled, yes, without the E,

02:09:37.360 | just F-R-I-D-M-A-N.

02:09:40.040 | And now let me leave you with some words

02:09:42.160 | from Marcus Aurelius.

02:09:44.880 | You have power over your mind, not outside events.

02:09:49.180 | Realize this and you will find strength.

02:09:52.180 | Thank you for listening and hope to see you next time.

02:09:56.560 | (upbeat music)

02:09:59.140 | (upbeat music)

02:10:01.720 | [BLANK_AUDIO]

Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115

Chapters