back to index

Dileep George: Brain-Inspired AI | Lex Fridman Podcast #115


Chapters

0:0 Introduction
4:50 Building a model of the brain
17:11 Visual cortex
27:50 Probabilistic graphical models
31:35 Encoding information in the brain
36:56 Recursive Cortical Network
51:9 Solving CAPTCHAs algorithmically
66:48 Hype around brain-inspired AI
78:21 How does the brain learn?
81:32 Perception and cognition
85:43 Open problems in brain-inspired AI
90:33 GPT-3
100:41 Memory
105:8 Neuralink
111:32 Consciousness
117:59 Book recommendations
126:49 Meaning of life

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Dalipe George,
00:00:03.440 | a researcher at the intersection of neuroscience
00:00:06.200 | and artificial intelligence,
00:00:07.960 | co-founder of Vicarious with Scott Phoenix,
00:00:11.000 | and formerly co-founder of Numenta with Jeff Hawkins,
00:00:14.800 | who's been on this podcast, and Donna Dubinsky.
00:00:18.880 | From his early work on hierarchical temporal memory
00:00:21.760 | to recursive cortical networks to today,
00:00:24.760 | Dalipe's always sought to engineer intelligence
00:00:27.640 | that is closely inspired by the human brain.
00:00:31.240 | As a side note, I think we understand very little
00:00:34.200 | about the fundamental principles
00:00:35.920 | underlying the function of the human brain,
00:00:38.920 | but the little we do know gives hints
00:00:41.200 | that may be more useful for engineering intelligence
00:00:43.680 | than any idea in mathematics, computer science, physics,
00:00:47.160 | and scientific fields outside of biology.
00:00:50.520 | And so the brain is a kind of existence proof
00:00:52.680 | that says it's possible.
00:00:54.520 | Keep at it.
00:00:56.320 | I should also say that brain-inspired AI is often overhyped
00:01:00.600 | and use this fodder, just as quantum computing
00:01:03.680 | for a marketing speak,
00:01:05.960 | but I'm not afraid of exploring these
00:01:08.080 | sometimes overhyped areas since where there's smoke,
00:01:11.560 | there's sometimes fire.
00:01:12.900 | Quick summary of the ads.
00:01:15.360 | Three sponsors, Babbel, Raycon Earbuds, and Masterclass.
00:01:19.960 | Please consider supporting this podcast
00:01:21.760 | by clicking the special links in the description
00:01:24.640 | to get the discount.
00:01:25.880 | It really is the best way to support this podcast.
00:01:29.160 | If you enjoy this thing, subscribe on YouTube,
00:01:31.560 | review it with the five stars on Apple Podcasts,
00:01:33.760 | support on Patreon, or connect with me on Twitter
00:01:36.560 | at Lex Friedman.
00:01:37.900 | As usual, I'll do a few minutes of ads now
00:01:41.180 | and never any ads in the middle
00:01:42.440 | that can break the flow of the conversation.
00:01:45.340 | This show is sponsored by Babbel,
00:01:47.500 | an app and website that gets you speaking
00:01:50.000 | in a new language within weeks.
00:01:52.040 | Go to babbel.com and use code Lex to get three months free.
00:01:55.740 | They offer 14 languages, including Spanish, French,
00:01:59.220 | Italian, German, and yes, Russian.
00:02:03.140 | Daily lessons are 10 to 15 minutes, super easy, effective,
00:02:07.040 | designed by over 100 language experts.
00:02:10.700 | Let me read a few lines from the Russian poem,
00:02:13.540 | "Noch, ulitza, fanar, apteka," by Alexander Bloch,
00:02:18.360 | that you'll start to understand if you sign up to Babbel.
00:02:22.140 | (speaking in foreign language)
00:02:26.060 | Now, I say that you'll only start to understand this poem
00:02:39.000 | because Russian starts with a language
00:02:41.560 | and ends with a vodka.
00:02:43.700 | Now, the latter part is definitely not endorsed
00:02:46.360 | or provided by Babbel
00:02:47.800 | and will probably lose me the sponsorship,
00:02:50.080 | but once you graduate from Babbel,
00:02:51.960 | you can enroll in my advanced course
00:02:53.600 | of late night Russian conversation over vodka.
00:02:56.500 | I have not yet developed an app for that.
00:02:59.140 | It's in progress.
00:03:00.660 | So get started by visiting babbel.com
00:03:02.940 | and use code Lex to get three months free.
00:03:05.560 | This show is sponsored by Raycon earbuds.
00:03:09.500 | Get them at buyraycon.com/lex.
00:03:12.580 | They've become my main method of listening to podcasts,
00:03:15.060 | audio books, and music when I run, do pushups and pull-ups,
00:03:18.820 | or just living life.
00:03:20.740 | In fact, I often listen to Brown Noise with them
00:03:23.300 | when I'm thinking deeply about something,
00:03:25.380 | it helps me focus.
00:03:27.060 | They're super comfortable, pair easily, great sound,
00:03:30.700 | great bass, six hours of playtime.
00:03:34.080 | I've been putting in a lot of miles to get ready
00:03:36.060 | for a potential ultra marathon
00:03:38.220 | and listening to audio books on World War II.
00:03:41.500 | The sound is rich and really comes in clear.
00:03:45.940 | So again, get them at buyraycon.com/lex.
00:03:50.140 | This show is sponsored by Masterclass.
00:03:52.780 | Sign up at masterclass.com/lex to get a discount
00:03:55.860 | and to support this podcast.
00:03:58.060 | When I first heard about Masterclass,
00:03:59.620 | I thought it was too good to be true.
00:04:01.620 | I still think it's too good to be true.
00:04:03.820 | For 180 bucks a year, you get an all access pass
00:04:06.760 | to watch courses from, to list some of my favorites,
00:04:10.140 | Chris Hadfield on space exploration,
00:04:12.460 | Neil deGrasse Tyson on scientific thinking
00:04:14.340 | and communication, Will Wright,
00:04:16.500 | creator of SimCity and Sims on game design.
00:04:19.380 | Every time I do this read,
00:04:21.180 | I really want to play a city builder game.
00:04:24.420 | Carlos Santana on guitar, Garry Kasparov on chess,
00:04:27.960 | Daniel Negrano on poker and many more.
00:04:30.780 | Chris Hadfield explaining how rockets work
00:04:33.500 | and the experience of being launched into space alone
00:04:36.200 | is worth the money.
00:04:37.540 | By the way, you can watch it on basically any device.
00:04:40.740 | Once again, sign up at masterclass.com to get a discount
00:04:43.700 | and to support this podcast.
00:04:45.900 | And now here's my conversation with Dilip George.
00:04:49.700 | Do you think we need to understand the brain
00:04:52.600 | in order to build it?
00:04:54.300 | - Yes, if you want to build the brain,
00:04:56.260 | we definitely need to understand how it works.
00:04:58.660 | So Blue Brain or Henry Markram's project
00:05:02.380 | is trying to build a brain without understanding it,
00:05:05.740 | like just trying to put details of the brain
00:05:10.700 | from neuroscience experiments into a giant simulation.
00:05:16.100 | By putting more and more neurons, more and more details.
00:05:19.060 | But that is not going to work
00:05:21.500 | because when it doesn't perform as what you expect it to do,
00:05:26.500 | then what do you do?
00:05:28.060 | You do, you just keep adding more details.
00:05:29.980 | How do you debug it?
00:05:30.940 | So unless you understand,
00:05:33.980 | unless you have a theory about
00:05:35.820 | how the system is supposed to work,
00:05:37.380 | how the pieces are supposed to fit together,
00:05:39.380 | what they're going to contribute, you can't build it.
00:05:42.300 | - At the functional level, understand.
00:05:44.300 | So can you actually linger on
00:05:46.140 | and describe the Blue Brain project?
00:05:48.660 | It's kind of a fascinating principle,
00:05:52.820 | an idea to try to simulate the brain.
00:05:55.860 | We're talking about the human brain, right?
00:05:57.740 | - Right, human brains and rat brains or cat brains
00:06:02.340 | have lots in common.
00:06:03.540 | That the cortex, the neocortex structure is very similar.
00:06:08.220 | So initially they were trying to just simulate a cat brain
00:06:13.780 | and--
00:06:14.860 | - To understand the nature of evil.
00:06:17.300 | - To understand the nature of evil.
00:06:19.420 | Or as it happens in most of the simulations,
00:06:22.740 | you easily get one thing out, which is oscillations.
00:06:27.700 | If you simulate a large number of neurons, they oscillate.
00:06:32.340 | And you can adjust the parameters and say that,
00:06:35.420 | oh, oscillations match the rhythm
00:06:37.980 | that we see in the brain, et cetera.
00:06:39.740 | But--
00:06:40.580 | - Oh, I see.
00:06:41.420 | So the idea is, is the simulation
00:06:44.340 | at the level of individual neurons?
00:06:46.900 | - Yeah, so the Blue Brain project,
00:06:49.260 | the original idea as proposed was,
00:06:51.980 | you put very detailed biophysical neurons,
00:06:57.260 | biophysical models of neurons.
00:07:00.460 | And you interconnect them according to the statistics
00:07:04.940 | of connections that we have found
00:07:06.420 | from real neuroscience experiments.
00:07:08.540 | And then turn it on.
00:07:10.740 | And see what happens.
00:07:13.100 | And these neural models are incredibly complicated
00:07:16.700 | in themselves, right?
00:07:17.940 | Because these neurons are modeled using this idea
00:07:22.940 | called Hodgkin-Huxley models,
00:07:24.580 | which are about how signals propagate in a cable.
00:07:28.260 | And there are active dendrites, all those phenomena,
00:07:31.860 | which those phenomena themselves,
00:07:34.060 | we don't understand that well.
00:07:36.020 | And then we put in connectivity,
00:07:38.700 | which is part guesswork, part observed.
00:07:42.580 | And of course, if we do not have any theory
00:07:44.620 | about how it is supposed to work,
00:07:46.580 | we just have to take whatever comes out of it as,
00:07:52.500 | okay, this is something interesting.
00:07:54.900 | - But in your sense, these models
00:07:56.780 | of the way signal travels along,
00:07:59.620 | like with the axons and all the basic models,
00:08:01.940 | they're too crude?
00:08:04.500 | - Oh, well, actually, they are pretty detailed
00:08:07.380 | and pretty sophisticated.
00:08:10.340 | And they do replicate the neural dynamics.
00:08:14.620 | If you take a single neuron,
00:08:16.460 | and you try to turn on the different channels,
00:08:20.980 | the calcium channels and the different receptors,
00:08:24.780 | and see what the effect of turning on
00:08:27.820 | or off those channels are in the neuron's spike output,
00:08:32.740 | people have built pretty sophisticated models of that.
00:08:35.420 | And they are, I would say, in the regime of correct.
00:08:40.420 | - Well, see, the correctness, that's interesting,
00:08:43.300 | 'cause you've mentioned it at several levels.
00:08:45.860 | The correctness is measured
00:08:46.980 | by looking at some kind of aggregate statistics?
00:08:49.580 | - It would be more the spiking dynamics of the--
00:08:53.260 | - Spiking dynamics of the single neuron, okay.
00:08:55.020 | - Yeah, and yeah, these models,
00:08:57.980 | because they are going to the level of mechanism, right?
00:09:00.780 | So they are basically looking at,
00:09:02.620 | okay, what is the effect of turning on an ion channel?
00:09:06.660 | And you can model that using electric circuits.
00:09:10.980 | And then, so they are model,
00:09:13.420 | so it is not just a function fitting,
00:09:16.460 | it is people are looking at the mechanism underlying it,
00:09:19.260 | and putting that in terms of electric circuit theory,
00:09:23.540 | signal propagation theory, and modeling that.
00:09:26.340 | And so those models are sophisticated,
00:09:29.420 | but getting a single neurons model 99% right
00:09:34.420 | does not still tell you how to,
00:09:38.460 | you know, it would be the analog
00:09:39.700 | of getting a transistor model right,
00:09:42.980 | and now trying to build a microprocessor.
00:09:46.300 | And if you just observe, you know,
00:09:49.220 | if you did not understand how a microprocessor works,
00:09:52.500 | but you say, oh, I now can model one transistor well,
00:09:56.180 | and now I will just try to interconnect the transistors
00:10:00.140 | according to whatever I could, you know,
00:10:02.100 | guess from the experiments and try to simulate it,
00:10:06.300 | then it is very unlikely
00:10:08.100 | that you will produce a functioning microprocessor.
00:10:11.500 | You want to, you know,
00:10:12.340 | when you want to produce a functioning microprocessor,
00:10:14.700 | you want to understand Boolean logic,
00:10:16.860 | how does, how do the gates work, all those things,
00:10:20.220 | and then, you know,
00:10:21.300 | understand how do those gates get implemented
00:10:23.020 | using transistors.
00:10:23.980 | - Yeah, there's actually,
00:10:25.220 | I remember this reminds me, there's a paper,
00:10:27.140 | maybe you're familiar with it,
00:10:29.020 | that I remember going through in a reading group
00:10:31.420 | that approaches a microprocessor
00:10:34.540 | from a perspective of a neuroscientist.
00:10:36.780 | I think it basically,
00:10:39.140 | it uses all the tools that we have of neuroscience
00:10:42.380 | to try to understand,
00:10:43.620 | like as if we just aliens showed up to study computers,
00:10:47.820 | - Yeah.
00:10:48.660 | - And to see if those tools can be used
00:10:50.940 | to get any kind of sense of how the microprocessor works.
00:10:54.500 | I think the final, the takeaway from,
00:10:57.860 | at least this initial exploration is that we're screwed.
00:11:02.380 | There's no way that the tools of neuroscience
00:11:04.140 | would be able to get us to anything,
00:11:06.340 | like not even Boolean logic.
00:11:07.940 | I mean, it's just,
00:11:09.700 | any aspect of the architecture of the function
00:11:16.280 | of the processes involved,
00:11:19.500 | the clocks, the timing, all that,
00:11:21.820 | you can't figure that out from the tools of neuroscience.
00:11:24.480 | - Yeah, so I'm very familiar with this particular paper.
00:11:27.700 | I think it was called,
00:11:30.340 | Can a Neuroscientist Understand a Microprocessor?
00:11:34.140 | - Yeah.
00:11:34.980 | - Something like that.
00:11:35.820 | Following the methodology in that paper,
00:11:38.580 | even an electrical engineer
00:11:40.020 | would not understand microprocessors.
00:11:41.460 | So I could, so,
00:11:42.300 | (both laughing)
00:11:44.380 | So I don't think it is that bad in the sense of saying,
00:11:49.220 | neuroscientists do find valuable things
00:11:53.100 | by observing the brain.
00:11:55.200 | They do find good insights,
00:11:58.360 | but those insights cannot be put together
00:12:01.680 | just as a simulation.
00:12:03.280 | You have to investigate
00:12:06.080 | what are the computational underpinnings
00:12:08.760 | of those findings.
00:12:10.380 | How do all of them fit together
00:12:13.040 | from an information processing perspective?
00:12:16.080 | You have to, somebody has to painstakingly
00:12:20.040 | put those things together and build hypothesis.
00:12:22.920 | So I don't want to diss all of neuroscientists saying,
00:12:25.700 | oh, they're not finding anything.
00:12:26.780 | No, that paper almost went to that level of,
00:12:29.760 | neuroscientists will never understand.
00:12:32.900 | No, that's not true.
00:12:34.220 | I think they do find lots of useful things,
00:12:36.740 | but it has to be put together in a computational framework.
00:12:39.940 | - Yeah, I mean, but just the AI systems
00:12:43.580 | will be listening to this podcast 100 years from now,
00:12:46.500 | and they will probably,
00:12:47.780 | there's some non-zero probability
00:12:50.820 | they'll find your words laughable.
00:12:52.620 | They're like, I remember humans thought
00:12:55.020 | they understood something about the brain
00:12:56.940 | and they were totally clueless.
00:12:58.200 | There's a sense about neuroscience
00:12:59.780 | that we may be in the very, very early days
00:13:02.140 | of understanding the brain.
00:13:04.260 | But I mean, that's one perspective.
00:13:07.100 | In your perspective,
00:13:10.140 | how far are we into understanding
00:13:12.900 | any aspect of the brain?
00:13:18.140 | So the dynamics of the individual neurocommunication
00:13:22.000 | to the, how in a collective sense,
00:13:26.660 | how they're able to store information,
00:13:29.100 | transfer information,
00:13:30.820 | how the intelligence then emerges,
00:13:32.580 | all that kind of stuff.
00:13:33.420 | Where are we on that timeline?
00:13:35.100 | - Yeah, so timelines are very, very hard to predict,
00:13:38.500 | and you can, of course, be wrong.
00:13:40.860 | And it can be wrong on either side.
00:13:44.140 | We know that when we look back,
00:13:47.620 | the first flight was in 1903.
00:13:51.060 | In 1900, there was a New York Times article
00:13:54.860 | on flying machines that do not fly.
00:13:57.940 | And humans might not fly for another 100 years.
00:14:02.700 | That was what that article stated.
00:14:04.780 | And so, but no, they flew three years after that.
00:14:08.380 | So it's very hard to, so-
00:14:11.540 | - Well, and on that point, one of the Wright brothers,
00:14:15.140 | I think two years before,
00:14:18.380 | said that, he said some number, like 50 years,
00:14:22.900 | he has become convinced that it's impossible.
00:14:27.900 | - Even during their experimentation, yeah, yeah, yeah.
00:14:31.860 | - I mean, that's a tribute to when,
00:14:34.140 | that's like the entrepreneurial battle of depression,
00:14:37.140 | of going through just thinking this is impossible.
00:14:40.700 | But there, yeah, there's something,
00:14:42.660 | even the person that's in it
00:14:44.380 | is not able to see, estimate correctly.
00:14:47.420 | - Exactly, but I can tell from the point of,
00:14:50.500 | objectively, what are the things that we know about the brain
00:14:53.620 | and how that can be used to build AI models,
00:14:57.140 | which can then go back and inform how the brain works.
00:15:00.820 | So my way of understanding the brain
00:15:02.820 | would be to basically say,
00:15:04.220 | look at the insights neuroscientists have found,
00:15:07.180 | understand that from a computational angle,
00:15:11.140 | information processing angle, build models using that.
00:15:15.300 | And then building that model, which functions,
00:15:19.420 | which is a functional model,
00:15:20.500 | which is doing the task that we want the model to do.
00:15:23.780 | It is not just trying to model a phenomena in the brain.
00:15:26.780 | It is trying to do what the brain is trying to do
00:15:29.220 | on the whole functional level.
00:15:32.340 | And building that model will help you
00:15:34.940 | fill in the missing pieces that,
00:15:37.460 | biology just gives you the hints.
00:15:39.340 | And building the model,
00:15:41.820 | fills in the rest of the pieces of the puzzle.
00:15:44.700 | And then you can go and connect that back to biology
00:15:47.660 | and say, okay, now it makes sense
00:15:49.580 | that this part of the brain is doing this,
00:15:53.660 | or this layer in the cortical circuit is doing this.
00:15:57.260 | And then continue this iteratively,
00:16:01.100 | because now that will inform new experiments
00:16:04.020 | in neuroscience.
00:16:05.020 | And of course, building the model
00:16:07.100 | and verifying that in the real world
00:16:08.900 | will also tell you more about,
00:16:11.780 | does the model actually work?
00:16:13.540 | And you can refine the model,
00:16:14.980 | find better ways of putting
00:16:17.260 | these neuroscience insights together.
00:16:19.580 | So I would say it is,
00:16:21.780 | so neuroscientists alone, just from experimentation,
00:16:27.180 | will not be able to build a model of the brain,
00:16:30.180 | or a functional model of the brain.
00:16:31.860 | So there's lots of efforts,
00:16:35.380 | which are very impressive efforts
00:16:36.540 | in collecting more and more connectivity data
00:16:40.340 | from the brain.
00:16:41.580 | How are the microcircuits of the brain
00:16:44.860 | connected with each other?
00:16:45.700 | - Those are beautiful, by the way.
00:16:47.220 | - Those are beautiful.
00:16:48.340 | And at the same time, those do not itself,
00:16:54.020 | by themselves, convey the story of how does it work.
00:16:57.820 | And somebody has to understand,
00:16:59.900 | okay, why are they connected like that?
00:17:01.740 | And what are those things doing?
00:17:04.580 | And we do that by building models in AI,
00:17:08.100 | using hints from neuroscience, and repeat the cycle.
00:17:11.980 | - So what aspect of the brain
00:17:15.820 | are useful in this whole endeavor?
00:17:17.700 | Which, by the way, I should say,
00:17:19.300 | you're both a neuroscientist and an AI person.
00:17:23.140 | I guess the dream is to both understand the brain
00:17:25.460 | and to build AGI systems.
00:17:27.460 | So you're, it's like an engineer's perspective
00:17:31.860 | of trying to understand the brain.
00:17:33.860 | So what aspects of the brain, functionally speaking,
00:17:37.220 | like you said, do you find interesting?
00:17:38.820 | - Yeah, quite a lot of things.
00:17:40.460 | So one is, if you look at the visual cortex,
00:17:45.180 | and the visual cortex is a large part of the brain.
00:17:50.860 | I forgot the exact fraction,
00:17:52.420 | but it's a huge part of our brain area
00:17:55.580 | is occupied by just vision.
00:17:59.100 | So vision, visual cortex is not just
00:18:01.660 | a feed-forward cascade of neurons.
00:18:05.820 | There are a lot more feedback connections in the brain
00:18:08.340 | compared to the feed-forward connections.
00:18:10.020 | And it is surprising to the level of detail
00:18:14.340 | neuroscientists have actually studied this.
00:18:16.380 | If you go into neuroscience literature
00:18:18.500 | and poke around and ask, have they studied
00:18:21.860 | what will be the effect of poking a neuron
00:18:24.740 | in level IT in level V1?
00:18:29.100 | And have they studied that?
00:18:32.500 | And you will say, yes, they have studied that.
00:18:35.380 | - So every possible combination has been studied.
00:18:38.820 | - I mean, it's not a random exploration at all.
00:18:41.620 | It's very hypothesis-driven, right?
00:18:44.140 | They are very, experimental neuroscientists
00:18:46.420 | are very, very systematic in how they probe the brain.
00:18:49.340 | Because experiments are very costly to conduct.
00:18:52.100 | They take a lot of preparation.
00:18:53.580 | They need a lot of control.
00:18:55.180 | So they are very hypothesis-driven
00:18:57.340 | in how they probe the brain.
00:18:58.580 | And often what I find is that when we have a question
00:19:03.900 | in AI about, has anybody probed
00:19:07.780 | how lateral connections in the brain works?
00:19:10.620 | And when you go and read the literature,
00:19:11.980 | yes, people have probed it,
00:19:13.140 | and people have probed it very systematically.
00:19:15.260 | And they have hypothesis about how those lateral connections
00:19:20.100 | are supposedly contributing to visual processing.
00:19:24.540 | But of course, they haven't built
00:19:26.220 | very, very functional detailed models of it.
00:19:28.820 | - By the way, how do they, in those studies,
00:19:30.460 | sorry to interrupt, do they stimulate like a neuron
00:19:33.780 | in one particular area of the visual cortex
00:19:36.540 | and then see how the signal travels kind of thing?
00:19:39.740 | - Fascinating, very, very fascinating experiments.
00:19:41.940 | So I can give you one example I was impressed with.
00:19:44.660 | This is, so before going to that,
00:19:46.900 | let me give you an overview
00:19:50.780 | of how the layers in the cortex are organized.
00:19:53.580 | Visual cortex is organized
00:19:56.060 | into roughly four hierarchical levels.
00:19:58.460 | Okay, so V1, V2, V4, IT.
00:20:01.700 | And in V1-
00:20:03.180 | - What happened to V3?
00:20:04.700 | - Well, yeah, there's another pathway.
00:20:06.780 | Okay, so there is, this is,
00:20:08.260 | I'm talking about just object recognition pathway.
00:20:10.580 | - All right, cool.
00:20:11.740 | - And then in V1 itself,
00:20:14.780 | so it's, there is a very detailed microcircuit in V1 itself.
00:20:19.420 | There is organization within a level itself.
00:20:22.580 | The cortical sheet is organized into multiple layers,
00:20:26.980 | and there are columnar structure.
00:20:28.860 | And this layer-wise and columnar structure
00:20:32.580 | is repeated in V1, V2, V4, IT, all of them, right?
00:20:37.580 | And the connections between these layers within a level,
00:20:41.500 | in V1 itself, there are six layers, roughly,
00:20:44.980 | and the connections between them,
00:20:46.500 | there is a particular structure to them.
00:20:48.700 | And now, so one example of an experiment people did is,
00:20:53.700 | when you present a stimulus, which is,
00:21:00.180 | let's say, requires separating the foreground
00:21:03.940 | from the background of an object.
00:21:05.660 | So it's a textured triangle on a textured background.
00:21:10.660 | And you can check, does the surface settle first,
00:21:15.700 | or does the contour settle first?
00:21:17.540 | - Settle?
00:21:19.940 | - Settle in the sense that the,
00:21:21.780 | so when you finally form the percept of the triangle,
00:21:28.220 | you understand where the contours of the triangle are,
00:21:31.140 | and you also know where the inside of the triangle is,
00:21:33.980 | right, that's when you form the final percept.
00:21:36.500 | Now, you can ask, what is the dynamics
00:21:40.060 | of forming that final percept?
00:21:41.860 | Do the neurons first find the edges
00:21:48.060 | and converge on where the edges are,
00:21:51.780 | and then they find the inner surfaces,
00:21:54.740 | or does it go the other way around?
00:21:55.940 | - The other way around.
00:21:56.900 | So what's the answer?
00:21:58.460 | - In this case, it turns out that it first settles
00:22:02.260 | on the edges, it converges on the edge hypothesis first,
00:22:06.100 | and then the surfaces are filled in
00:22:09.100 | from the edges to the inside.
00:22:11.140 | - That's fascinating.
00:22:12.180 | - And the detail to which you can study this,
00:22:15.900 | it's amazing that you can actually not only find
00:22:20.020 | the temporal dynamics of when this happens,
00:22:23.300 | and then you can also find which layer in V1,
00:22:27.300 | which layer is encoding the edges,
00:22:30.580 | which layer is encoding the surfaces,
00:22:33.100 | and which layer is encoding the feedback,
00:22:36.060 | which layer is encoding the feedforward,
00:22:37.620 | and what's the combination of them
00:22:39.620 | that produces the final percept.
00:22:41.220 | And these kinds of experiments stand out
00:22:44.580 | when you try to explain illusions.
00:22:47.100 | One example of a favorite illusion of mine
00:22:50.140 | is the Kanitsa triangle,
00:22:51.220 | I don't know that you are familiar with this one.
00:22:53.260 | So this is an example where it's a triangle,
00:22:58.220 | but only the corners of the triangle
00:23:01.500 | are shown in the stimulus.
00:23:04.100 | So they look like kind of Pac-Man.
00:23:06.300 | - Oh, the black Pac-Man, yeah.
00:23:08.780 | - And then your visual system hallucinates the edges.
00:23:13.020 | And when you look at it, you will see a faint edge.
00:23:17.660 | And you can go inside the brain and look,
00:23:21.820 | do actually neurons signal the presence of this edge?
00:23:25.700 | And if they signal, how do they do it?
00:23:28.580 | Because they are not receiving anything from the input.
00:23:32.100 | The input is black for those neurons, right?
00:23:35.340 | So how do they signal it?
00:23:37.540 | When does the signaling happen?
00:23:39.180 | So if a real contour is present in the input,
00:23:43.460 | then the neurons immediately signal,
00:23:46.780 | oh, okay, there is an edge here.
00:23:49.220 | When it is an illusory edge,
00:23:51.940 | it is clearly not in the input,
00:23:53.580 | it is coming from the context.
00:23:55.620 | So those neurons fire later.
00:23:57.980 | And you can say that, okay,
00:24:00.020 | it's the feedback connections that is causing them to fire.
00:24:03.460 | And they happen later,
00:24:06.100 | and you can find the dynamics of them.
00:24:08.980 | So these studies are pretty impressive and very detailed.
00:24:13.380 | - So by the way, just a step back,
00:24:16.940 | you said that there may be more feedback connections
00:24:20.220 | than feedforward connections.
00:24:21.740 | First of all, just for like a machine learning folks,
00:24:26.740 | I mean, that's crazy
00:24:29.380 | that there's all these feedback connections.
00:24:32.180 | We often think about,
00:24:34.220 | thanks to deep learning,
00:24:38.460 | you start to think about the human brain
00:24:41.780 | as a kind of feedforward mechanism.
00:24:44.780 | So what the heck are these feedback connections?
00:24:48.660 | - Yeah.
00:24:49.500 | - What's the dynamics?
00:24:52.260 | What are we supposed to think about them?
00:24:54.060 | - Yeah, so this fits into a very beautiful picture
00:24:57.180 | about how the brain works, right?
00:24:59.380 | So the beautiful picture of how the brain works
00:25:02.340 | is that our brain is building a model of the world.
00:25:06.420 | I know, so our visual system is building a model
00:25:10.380 | of how objects behave in the world.
00:25:12.860 | And we are constantly projecting
00:25:15.260 | that model back onto the world.
00:25:17.180 | So what we are seeing is not just a feedforward thing
00:25:21.300 | that just gets interpreted in a feedforward part.
00:25:23.860 | We are constantly projecting our expectations
00:25:26.180 | onto the world.
00:25:27.220 | And what the final percept is a combination
00:25:30.300 | of what we project onto the world,
00:25:32.620 | combined with what the actual sensory input is.
00:25:36.860 | - Almost like trying to calculate the difference
00:25:39.140 | and then trying to interpret the difference.
00:25:40.980 | - Yeah, I wouldn't put it as calculating the difference.
00:25:44.580 | It's more like what is the best explanation
00:25:46.860 | for the input stimulus based on the model
00:25:51.140 | of the world I have?
00:25:52.460 | - Got it, got it.
00:25:53.900 | And that's where all the illusions come in.
00:25:56.060 | But that's an incredibly efficient process.
00:26:00.060 | So the feedback mechanism, it just helps you constantly,
00:26:04.780 | yeah, to hallucinate how the world should be
00:26:07.460 | based on your world model.
00:26:08.540 | And then just looking at if there's novelty,
00:26:13.540 | like trying to explain it.
00:26:16.460 | Hence, that's why movement,
00:26:18.820 | we detect movement really well.
00:26:20.380 | There's all these kinds of things.
00:26:21.740 | And this is like at all different levels
00:26:24.740 | of the cortex you're saying.
00:26:26.860 | This happens at the lowest level, at the highest level.
00:26:29.300 | - Yes, yeah.
00:26:30.260 | In fact, feedback connections are more prevalent
00:26:32.340 | in everywhere in the cortex.
00:26:33.820 | And so one way to think about it,
00:26:37.340 | and there's a lot of evidence for this,
00:26:38.700 | is inference.
00:26:41.180 | So basically, if you have a model of the world,
00:26:44.180 | and when some evidence comes in,
00:26:47.540 | what you are doing is inference.
00:26:49.700 | You are trying to now explain this evidence
00:26:53.420 | using your model of the world.
00:26:55.460 | And this inference includes projecting your model
00:26:59.060 | onto the evidence and taking the evidence
00:27:02.620 | back into the model and doing an iterative procedure.
00:27:06.780 | And this iterative procedure is what happens
00:27:10.340 | using the feedforward feedback propagation.
00:27:13.180 | And feedback affects what you see in the world,
00:27:16.340 | and it also affects feedforward propagation.
00:27:18.780 | And examples are everywhere.
00:27:21.220 | We see these kinds of things everywhere.
00:27:24.660 | The idea that there can be multiple competing hypothesis
00:27:28.460 | in our model, trying to explain the same evidence,
00:27:32.580 | and then you have to kind of make them compete.
00:27:35.940 | And one hypothesis will explain away the other hypothesis
00:27:40.660 | through this competition process.
00:27:42.260 | - Wait, what?
00:27:43.100 | So you have competing models of the world
00:27:47.020 | that try to explain, what do you mean by explain away?
00:27:50.180 | - So this is a classic example in graphical models,
00:27:54.100 | probabilistic models.
00:27:55.620 | So if you-- - Oh, what are those?
00:27:58.620 | - Okay. (laughs)
00:28:01.100 | - I think it's useful to mention
00:28:02.500 | because we'll talk about them more.
00:28:04.260 | - Yeah, yeah.
00:28:05.180 | So neural networks are one class
00:28:08.580 | of machine learning models.
00:28:10.900 | You have distributed set of nodes,
00:28:13.820 | which are called the neurons.
00:28:15.780 | Each one is doing a dot product
00:28:17.260 | and you can approximate any function
00:28:19.060 | using this multilevel network of neurons.
00:28:22.460 | So that's a class of models
00:28:24.780 | which are useful for function approximation.
00:28:27.500 | There is another class of models in machine learning
00:28:30.740 | called probabilistic graphical models.
00:28:33.140 | And you can think of them as each node in that model
00:28:38.140 | is variable, which is talking about something.
00:28:41.980 | It can be a variable representing,
00:28:44.540 | is an edge present in the input or not?
00:28:48.100 | And at the top of the network,
00:28:52.020 | a node can be representing,
00:28:55.260 | is there an object present in the world or not?
00:28:58.380 | And then, so it is another way of encoding knowledge.
00:29:03.380 | And then once you encode the knowledge,
00:29:08.260 | you can do inference in the right way.
00:29:12.500 | What is the best way to explain some set of evidence
00:29:17.500 | using this model that you encoded?
00:29:19.660 | So when you encode the model,
00:29:21.020 | you are encoding the relationship
00:29:22.780 | between these different variables.
00:29:23.860 | How is the edge connected to the model of the object?
00:29:27.660 | How is the surface connected to the model of the object?
00:29:30.060 | And then, of course,
00:29:32.700 | this is a very distributed, complicated model.
00:29:34.900 | And inference is, how do you explain a piece of evidence
00:29:39.900 | when a set of stimulus comes in?
00:29:41.420 | If somebody tells me there is a 50% probability
00:29:44.460 | that there is an edge here in this part of the model,
00:29:46.900 | how does that affect my belief on whether I should think
00:29:51.180 | that there should be a square present in the image?
00:29:54.460 | So this is the process of inference.
00:29:56.940 | So one example of inference is having this
00:30:00.500 | expanding effect between multiple causes.
00:30:03.180 | So graphical models can be used
00:30:06.140 | to represent causality in the world.
00:30:08.940 | So let's say, you know, your alarm at home
00:30:13.940 | can be triggered by a burglar getting into your house,
00:30:21.220 | or it can be triggered by an earthquake.
00:30:24.860 | Both can be causes of the alarm going off.
00:30:27.900 | So now, you're in your office,
00:30:31.380 | you heard burglar alarm going off,
00:30:33.380 | you are heading home,
00:30:35.380 | thinking that there's a burglar, got it.
00:30:37.500 | But while driving home, if you hear on the radio
00:30:39.860 | that there was an earthquake in the vicinity,
00:30:42.460 | now your strength of evidence
00:30:45.980 | for a burglar getting into their house is diminished.
00:30:50.180 | Because now that piece of evidence is explained
00:30:53.380 | by the earthquake being present.
00:30:55.660 | So if you think about these two causes
00:30:57.700 | explaining at lower level variable, which is alarm,
00:31:00.940 | now what we're seeing is that
00:31:03.340 | increasing the evidence for some cause,
00:31:05.580 | there is evidence coming in from below
00:31:08.660 | for alarm being present.
00:31:10.380 | And initially it was flowing to a burglar being present,
00:31:14.300 | but now since there is side evidence for this other cause,
00:31:19.300 | it explains away this evidence
00:31:21.140 | and evidence will now flow to the other cause.
00:31:23.260 | This is two competing causal things
00:31:26.580 | trying to explain the same evidence.
00:31:28.380 | - And the brain has a similar kind of mechanism
00:31:31.460 | for doing so.
00:31:32.460 | That's kind of interesting.
00:31:33.820 | How's that all encoded in the brain?
00:31:38.460 | Like where's the storage of information?
00:31:40.380 | Are we talking, just maybe to get it
00:31:43.740 | a little bit more specific,
00:31:45.620 | is it in the hardware of the actual connections?
00:31:48.220 | Is it in chemical communication?
00:31:51.020 | Is it electrical communication?
00:31:53.180 | Do we know?
00:31:54.340 | - So this is a paper that we are bringing out soon.
00:31:57.900 | - Which one is this?
00:31:59.220 | - This is the cortical microcircuits paper
00:32:01.300 | that I sent you a draft of.
00:32:03.220 | Of course, a lot of it is still hypothesis.
00:32:06.100 | One hypothesis is that you can think of a cortical column
00:32:09.980 | as encoding a concept.
00:32:13.060 | A concept, think of it as an example of a concept
00:32:18.060 | is an edge present or not,
00:32:22.180 | or is an object present or not.
00:32:25.180 | Okay, so you can think of it as a binary variable,
00:32:27.420 | a binary random variable,
00:32:28.860 | the presence of an edge or not,
00:32:30.460 | or the presence of an object or not.
00:32:32.140 | So each cortical column can be thought of
00:32:34.380 | as representing that one concept, one variable.
00:32:38.140 | And then the connections between these cortical columns
00:32:41.420 | are basically encoding the relationship
00:32:43.820 | between these random variables.
00:32:45.700 | And then there are connections within the cortical column.
00:32:49.460 | Each cortical column is implemented
00:32:51.180 | using multiple layers of neurons
00:32:53.180 | with very, very, very rich structure there.
00:32:57.780 | There are thousands of neurons in a cortical column.
00:33:00.700 | - But that structure is similar
00:33:02.140 | across the different cortical columns.
00:33:03.620 | - Correct, correct.
00:33:04.940 | And also these cortical columns
00:33:06.820 | connect to a substructure called thalamus.
00:33:09.220 | So all cortical columns pass through this substructure.
00:33:14.220 | So our hypothesis is that
00:33:17.300 | the connections between the cortical columns
00:33:18.980 | implement this, that's where the knowledge is stored
00:33:22.900 | about how these different concepts connect to each other.
00:33:27.300 | And then the neurons inside this cortical column
00:33:30.860 | and in thalamus in combination
00:33:33.060 | implement this actual computations in data for inference,
00:33:38.060 | which includes explaining away
00:33:39.980 | and competing between the different hypothesis.
00:33:43.740 | And it is all very,
00:33:46.100 | so what is amazing is that
00:33:47.540 | neuroscientists have actually done experiments
00:33:51.940 | to the tune of showing these things.
00:33:54.460 | They might not be putting it
00:33:55.620 | in the overall inference framework,
00:33:58.620 | but they will show things like,
00:34:00.460 | if I poke this higher level neuron,
00:34:03.260 | it will inhibit through this complicated loop
00:34:06.020 | through the thalamus,
00:34:06.860 | it will inhibit this other column.
00:34:08.580 | So they will do such experiments.
00:34:11.940 | - Do they use terminology of concepts, for example?
00:34:14.780 | So, I mean,
00:34:16.980 | is it something where,
00:34:21.420 | it's easy to anthropomorphize and think about concepts,
00:34:25.140 | like you started moving into logic-based
00:34:28.260 | kind of reasoning systems.
00:34:29.860 | So how would you think of concepts in that kind of way?
00:34:34.140 | Or is it a lot messier, a lot more gray area,
00:34:39.140 | you know, even more gray,
00:34:43.420 | even more messy than the artificial neural network kinds,
00:34:47.620 | kinds of abstractions?
00:34:48.460 | - Easiest way to think of it is a variable, right?
00:34:50.820 | It's a binary variable,
00:34:52.140 | which is showing the presence or absence of something.
00:34:55.900 | - But I guess what I'm asking is,
00:34:58.100 | is that something that we're supposed to think of something
00:35:01.900 | that's human interpretable, of that something?
00:35:04.180 | - It doesn't need to be.
00:35:05.500 | It doesn't need to be human interpretable.
00:35:07.100 | There's no need for it to be human interpretable.
00:35:09.740 | But it's almost like,
00:35:12.700 | you will be able to find some interpretation of it
00:35:17.700 | because it is connected to the other things
00:35:21.220 | that you know about.
00:35:22.100 | - And the point is it's useful somehow.
00:35:25.460 | It's useful as an entity in the graphic,
00:35:30.460 | in connecting to the other entities that are,
00:35:33.020 | let's call them concepts.
00:35:34.740 | Okay, so by the way, are these the cortical microcircuits?
00:35:40.020 | - Correct, these are the cortical microcircuits.
00:35:42.300 | That's what neuroscientists use to talk about the circuits
00:35:45.780 | within a level of the cortex.
00:35:49.140 | So you can think of, let's think of a neural network,
00:35:53.060 | artificial neural network terms.
00:35:54.820 | People talk about the architecture of the,
00:35:56.660 | so how many layers they build,
00:35:59.860 | what is the fan in, fan out, et cetera.
00:36:01.820 | That is the macro architecture.
00:36:03.380 | And then within a layer of the neural network,
00:36:08.500 | you can, the cortical neural network
00:36:11.860 | is much more structured within a level.
00:36:14.620 | There's a lot more intricate structure there.
00:36:17.300 | But even within an artificial neural network,
00:36:20.380 | you can think of feature detection plus pooling as one level.
00:36:24.900 | And so that is kind of a microcircuit.
00:36:27.500 | It's much more complex in the real brain.
00:36:30.980 | So within a level, whatever is that circuitry
00:36:35.500 | within a column of the cortex
00:36:37.660 | and between the layers of the cortex,
00:36:39.340 | that's the microcircuitry.
00:36:41.060 | - I love that terminology.
00:36:42.660 | Machine learning people don't use the circuit terminology,
00:36:45.940 | but they should.
00:36:46.980 | It's nice.
00:36:47.940 | So, okay.
00:36:48.780 | Okay, so that's the cortical microcircuits.
00:36:53.980 | So what's interesting about, what can we say,
00:36:57.300 | what is the paper that you're working on,
00:37:00.820 | propose about the ideas around these cortical microcircuits?
00:37:04.420 | - So this is a fully functional model
00:37:08.060 | for the microcircuits of the visual cortex.
00:37:10.780 | - So the paper focuses, and your idea in our discussions now
00:37:13.700 | is focusing on vision.
00:37:15.660 | - Yeah.
00:37:16.500 | - The visual cortex.
00:37:18.420 | Okay.
00:37:19.260 | So this is a model, this is a full model.
00:37:20.540 | This is how vision works.
00:37:23.020 | - Well, this is a model of--
00:37:24.820 | - A hypothesis.
00:37:25.660 | - Yeah.
00:37:26.500 | - A hypothesis.
00:37:27.340 | - Okay, so let me step back a bit.
00:37:29.500 | So we looked at neuroscience for insights
00:37:32.660 | on how to build a vision model.
00:37:35.020 | - Right.
00:37:35.860 | - And we synthesized all those insights
00:37:38.300 | into a computational model.
00:37:39.780 | This is called the recursive cortical network model
00:37:41.980 | that we used for breaking CAPTCHAs.
00:37:45.140 | And we are using the same model for robotic picking
00:37:48.860 | and tracking of objects.
00:37:50.860 | - And that again is a vision system.
00:37:52.500 | - That's a vision system.
00:37:53.340 | - Computer vision system.
00:37:54.540 | - That's a computer vision system.
00:37:55.500 | - Takes in images and outputs what?
00:37:59.260 | - On one side, it outputs the class of the image
00:38:02.340 | and also segments the image.
00:38:05.620 | And you can also ask it further queries.
00:38:07.420 | Where is the edge of the object?
00:38:08.860 | Where is the interior of the object?
00:38:10.500 | So it's a model that you build to answer multiple questions.
00:38:15.100 | So you're not trying to build a model
00:38:16.580 | for just classification or just segmentation, et cetera.
00:38:19.540 | It's a joint model that can do multiple things.
00:38:23.820 | And so that's the model that we built
00:38:27.980 | using insights from neuroscience.
00:38:30.380 | And some of those insights are,
00:38:32.300 | what is the role of feedback connections?
00:38:34.220 | What is the role of lateral connections?
00:38:36.060 | So all those things went into the model.
00:38:37.940 | The model actually uses feedback connections.
00:38:40.860 | - All these ideas from neuroscience.
00:38:43.020 | - Yeah.
00:38:43.860 | - So what the heck is a recursive cortical network?
00:38:46.420 | Like what are the architecture approaches,
00:38:49.180 | interesting aspects here,
00:38:50.740 | which is essentially a brain inspired approach
00:38:54.900 | to a computer vision?
00:38:56.500 | - Yeah.
00:38:57.340 | So there are multiple layers to this question.
00:38:59.420 | I can go from the very, very top and then zoom in.
00:39:03.420 | So one important thing, constraint that went into the model
00:39:07.060 | is that you should not think vision,
00:39:10.060 | think of vision as something in isolation.
00:39:12.900 | We should not think perception as something
00:39:16.060 | as a pre-processor for cognition.
00:39:18.100 | Perception and cognition are interconnected.
00:39:21.700 | And so you should not think of one problem
00:39:24.140 | in separation from the other problem.
00:39:26.220 | And so that means if you finally want to have a system
00:39:29.860 | that understand concepts about the world
00:39:32.460 | and can learn in a very conceptual model of the world
00:39:35.660 | and can reason and connect to language,
00:39:37.700 | all of those things,
00:39:38.820 | you need to have, think all the way through
00:39:41.660 | and make sure that your perception system
00:39:44.140 | is compatible with your cognition system
00:39:46.340 | and language system and all of them.
00:39:48.220 | And one aspect of that is top-down controllability.
00:39:51.740 | - What does that mean?
00:39:54.460 | - So that means,
00:39:55.300 | - In this context.
00:39:56.140 | - So think of, you can close your eyes
00:39:59.340 | and think about the details of one object.
00:40:02.220 | I can zoom in further and further.
00:40:05.540 | So think of the bottle in front of me.
00:40:08.220 | And now you can think about,
00:40:11.460 | okay, what the cap of that bottle looks.
00:40:14.220 | I know we can think about what's the texture
00:40:16.060 | on that bottle of the cap.
00:40:19.340 | You can think about what will happen if something hits that.
00:40:24.660 | So you can manipulate your visual knowledge
00:40:29.180 | in cognition driven ways.
00:40:32.060 | - Yes.
00:40:32.900 | - And so this top-down controllability
00:40:36.380 | and being able to simulate scenarios in the world.
00:40:40.580 | - So you're not just a passive player
00:40:44.340 | in this perception game.
00:40:45.820 | You can control it.
00:40:47.420 | You have imagination.
00:40:49.500 | - Correct, correct.
00:40:50.620 | So basically, having a generative network,
00:40:54.900 | which is a model,
00:40:55.940 | and it is not just some arbitrary generative network.
00:40:58.820 | It has to be built in a way that it is controllable top-down.
00:41:03.020 | It is not just trying to generate a whole picture at once.
00:41:06.780 | It's not trying to generate
00:41:09.060 | photorealistic things of the world.
00:41:10.500 | You don't have good photorealistic models of the world.
00:41:13.300 | Human brains do not have.
00:41:14.300 | If I, for example, ask you the question,
00:41:17.100 | what is the color of the letter E in the Google logo?
00:41:20.660 | You have no idea.
00:41:23.300 | - No idea.
00:41:24.140 | - You probably have seen it millions of times.
00:41:25.820 | (laughing)
00:41:26.660 | Or not millions of times, hundreds of times.
00:41:28.060 | (laughing)
00:41:28.900 | So it's not, our model is not photorealistic.
00:41:31.300 | But it has other properties that we can manipulate it.
00:41:35.100 | And you can think about filling in
00:41:37.940 | a different color in that logo.
00:41:39.460 | You can think about expanding the letter E.
00:41:42.100 | So you can imagine the consequence of actions
00:41:47.620 | that you have never performed.
00:41:48.780 | So these are the kind of characteristics
00:41:51.100 | the generative model need to have.
00:41:52.740 | So this is one constraint that went into our model.
00:41:55.140 | So this is, when you read the,
00:41:57.460 | just the perception side of the paper,
00:41:59.180 | it is not obvious that this was a constraint
00:42:01.380 | into the, that went into the model,
00:42:03.220 | this top-down controllability of the generative model.
00:42:06.500 | - So what does top-down controllability in a model look like?
00:42:11.500 | It's a really interesting concept, fascinating concept.
00:42:15.140 | What is that?
00:42:15.980 | Is that the recursiveness gives you that?
00:42:18.700 | Or how do you do it?
00:42:21.220 | - Quite a few things.
00:42:22.060 | It's like, what does the model factor,
00:42:24.860 | factorize, what are the,
00:42:26.620 | what is the model representing as different pieces
00:42:29.100 | in the puzzle?
00:42:29.940 | So in the RCN network, it thinks of the world,
00:42:34.940 | so what I say, the background of an image
00:42:38.420 | is modeled separately from the foreground of the image.
00:42:41.700 | So the objects are separate from the background.
00:42:44.220 | They are different entities.
00:42:45.380 | - So there's a kind of segmentation
00:42:46.980 | that's built in fundamentally to the structure.
00:42:49.260 | And then even that object is composed of parts.
00:42:53.020 | And also, another one is the shape of the object
00:42:56.700 | is differently modeled from the texture of the object.
00:43:00.900 | - Got it.
00:43:02.900 | So there's like these, I've been,
00:43:06.020 | you know who Francois Chollet is?
00:43:08.540 | - Yeah, yeah.
00:43:09.380 | - He's, so there's, he developed this IQ test
00:43:13.460 | type of thing for ARC challenge for,
00:43:16.460 | and it's kind of cool that there's these concepts,
00:43:20.620 | priors that he defines that you bring to the table
00:43:24.020 | in order to be able to reason about basic shapes
00:43:26.540 | and things in an IQ test.
00:43:28.660 | So here you're making it quite explicit
00:43:31.580 | that here are the things that you should be,
00:43:34.900 | these are like distinct things
00:43:36.700 | that you should be able to model in this.
00:43:40.140 | - Keep in mind that you can derive this
00:43:42.820 | from much more general principles.
00:43:44.500 | It doesn't, you don't need to explicitly put it as,
00:43:47.260 | oh, objects versus foreground versus background,
00:43:50.860 | the surface versus texture.
00:43:52.220 | No, these are derivable from more fundamental principles
00:43:56.100 | of how, you know, what's the property
00:43:59.300 | of continuity of natural signals?
00:44:01.060 | - What's the property of continuity of natural signals?
00:44:05.500 | - Yeah.
00:44:06.340 | - By the way, that sounds very poetic, but yeah.
00:44:08.860 | So you're saying that's a,
00:44:10.540 | there's some low-level properties
00:44:12.740 | from which emerges the idea that shapes
00:44:14.700 | should be different than,
00:44:16.540 | like there should be a parts of an object,
00:44:18.620 | there should be, I mean,
00:44:19.980 | - Exactly.
00:44:20.820 | - Kind of like Francois, I mean, there's objectness,
00:44:23.500 | there's all these things that it's kind of crazy
00:44:25.980 | that we humans, I guess, evolved to have
00:44:29.620 | because it's useful for us to perceive the world.
00:44:32.020 | - Correct, correct.
00:44:32.860 | And it derives mostly from the properties
00:44:35.300 | of natural signals.
00:44:36.380 | And so-
00:44:39.020 | - Natural signals.
00:44:40.460 | So natural signals are the kind of things
00:44:42.540 | we'll perceive in the natural world.
00:44:44.740 | - Correct.
00:44:45.580 | - I don't know why that sounds so beautiful,
00:44:47.540 | natural signals, yeah.
00:44:48.700 | - As opposed to a QR code, right?
00:44:50.660 | Which is an artificial signal that we created.
00:44:52.780 | Humans are not very good at classifying QR codes.
00:44:55.620 | We are very good at saying something is a cat or a dog,
00:44:58.460 | but not very good at, you know,
00:45:00.100 | where computers are very good at classifying QR codes.
00:45:03.900 | So our visual system is tuned for natural signals.
00:45:08.460 | And there are fundamental assumptions in the architecture
00:45:11.340 | that are derived from natural signals properties.
00:45:15.140 | - I wonder when you take a hallucinogenic drugs,
00:45:18.340 | does that go into natural or is that closer to QR code?
00:45:22.620 | - It's still natural.
00:45:23.900 | - It's still natural?
00:45:24.780 | - Yeah, because it is still operating using our brains.
00:45:28.060 | - By the way, on that topic, I mean,
00:45:30.140 | I haven't been following,
00:45:31.340 | I think they're becoming legalized in certain,
00:45:33.260 | I can't wait until they become legalized to a degree
00:45:36.980 | that you, like vision science researchers could study it.
00:45:40.100 | - Yeah.
00:45:40.940 | - And then through medical, chemical ways, modify it.
00:45:45.860 | There could be ethical concerns,
00:45:47.060 | but that's another way to study the brain
00:45:49.820 | is to be able to chemically modify it.
00:45:53.180 | There's probably very long a way
00:45:56.980 | to figure out how to do it ethically.
00:45:59.180 | - Yeah, but I think there are studies on that already.
00:46:02.500 | - Already?
00:46:03.340 | - Yeah, I think so.
00:46:04.260 | Because it's not unethical to give it to rats.
00:46:08.100 | - Oh, that's true, that's true.
00:46:09.500 | (both laughing)
00:46:12.180 | There's a lot of drugged up rats out there.
00:46:14.420 | Okay, cool, sorry, sorry to, so okay.
00:46:16.540 | So there's these low level things from natural signals
00:46:21.540 | that can--
00:46:26.580 | - From which these properties will emerge.
00:46:28.660 | - Yes.
00:46:29.500 | - But it is still a very hard problem
00:46:32.660 | on how to encode that.
00:46:34.580 | So you don't, there is no,
00:46:36.580 | so you mentioned the priors Franscho wanted to encode
00:46:41.580 | in the abstract reasoning challenge,
00:46:45.020 | but it is not straightforward how to encode those priors.
00:46:48.780 | So some of those challenges,
00:46:51.140 | like the object recognition, completion challenges
00:46:53.860 | are things that we purely use our visual system to do.
00:46:57.260 | It looks like abstract reasoning,
00:46:59.540 | but it is purely an output of the vision system.
00:47:02.460 | For example, completing the corners
00:47:04.340 | of that Kaninsa triangle,
00:47:05.460 | completing the lines of that Kaninsa triangle.
00:47:07.340 | It's a purely a visual system property.
00:47:09.540 | There is no abstract reasoning involved.
00:47:11.220 | It uses all these priors,
00:47:13.140 | but it is stored in our visual system in a particular way
00:47:17.140 | that is amenable to inference.
00:47:19.180 | And that is one of the things that we tackled in the,
00:47:24.180 | basically saying, okay, these are the prior knowledge
00:47:27.180 | which will be derived from the world,
00:47:29.500 | but then how is that prior knowledge represented
00:47:32.580 | in the model such that inference,
00:47:35.380 | when some piece of evidence comes in,
00:47:37.340 | can be done very efficiently and in a very distributed way.
00:47:40.980 | Because it is very,
00:47:43.300 | there are so many ways of representing knowledge,
00:47:45.500 | which is not amenable to very quick inference,
00:47:49.140 | you know, quick lookups.
00:47:50.820 | And so that's one core part of what we tackled
00:47:55.260 | in the RCN model.
00:47:57.460 | How do you encode visual knowledge
00:47:59.820 | to do very quick inference?
00:48:02.180 | And yeah.
00:48:03.020 | - Can you maybe comment on,
00:48:04.540 | so folks listening to this in general
00:48:07.100 | may be familiar with different kinds of architectures
00:48:09.980 | of neural networks.
00:48:11.140 | What are we talking about with the RCN?
00:48:14.620 | What does the architecture look like?
00:48:17.260 | What are the different components?
00:48:18.860 | Is it close to neural networks?
00:48:20.140 | Is it far away from neural networks?
00:48:21.940 | What does it look like?
00:48:22.780 | - Yeah, so you can think of the delta between the model
00:48:27.140 | and a convolutional neural network,
00:48:28.700 | if people are familiar with convolutional neural networks.
00:48:31.580 | So convolutional neural networks
00:48:32.820 | have this feed-forward processing cascade,
00:48:35.260 | which is called feature detectors and pooling.
00:48:38.620 | And that is repeated in the hierarchy,
00:48:40.620 | in a multi-level system.
00:48:43.660 | And if you want an intuitive idea of what is happening,
00:48:47.820 | feature detectors are, you know,
00:48:50.460 | detecting interesting co-occurrences in the input.
00:48:54.340 | It can be a line, a corner, an eye,
00:48:58.460 | or a piece of texture, et cetera.
00:49:00.860 | And the pooling neurons are doing
00:49:04.180 | some local transformation of that
00:49:06.340 | and making it invariant to local transformations.
00:49:08.820 | So this is what the structure
00:49:09.780 | of convolutional neural network is.
00:49:11.580 | Recursive cortical network has a similar structure
00:49:16.540 | when you look at just the feed-forward pathway.
00:49:18.900 | But in addition to that,
00:49:20.020 | it is also structured in a way that it is generative.
00:49:22.660 | So that it can run it backward
00:49:25.660 | and combine the forward with the backward.
00:49:28.500 | Another aspect that it has is it has lateral connections.
00:49:33.340 | These lateral connections, which is between,
00:49:37.660 | so if you have an edge here and an edge here,
00:49:40.220 | it has connections between these edges.
00:49:42.060 | It is not just feed-forward connections.
00:49:43.780 | It is something between these edges,
00:49:47.380 | which is the nodes representing these edges,
00:49:50.300 | which is to enforce compatibility between them.
00:49:53.020 | So otherwise what will happen is that-
00:49:54.220 | - Like constraints?
00:49:55.220 | - It's a constraint.
00:49:56.060 | It's basically, if you do just feature detection
00:49:59.620 | followed by pooling,
00:50:01.300 | then your transformations in different parts
00:50:04.460 | of the visual field are not coordinated.
00:50:06.500 | And so you will create jagged,
00:50:10.620 | when you generate from the model,
00:50:12.060 | you will create jagged things
00:50:14.820 | and uncoordinated transformations.
00:50:17.420 | So these lateral connections
00:50:18.860 | are enforcing the transformations.
00:50:22.020 | - Is the whole thing still differentiable?
00:50:24.780 | - No.
00:50:25.620 | - Okay.
00:50:26.460 | - No.
00:50:27.300 | (laughs)
00:50:28.140 | It's not trained using a backprop.
00:50:30.260 | - Okay, that's really important.
00:50:31.540 | So there's these feed-forward,
00:50:33.940 | there's feedback mechanisms.
00:50:35.940 | There's some interesting connectivity things.
00:50:37.780 | It's still layered?
00:50:38.940 | Like-
00:50:39.780 | - Yes, there are multiple levels.
00:50:41.420 | - Multiple layers.
00:50:43.500 | Okay, very, very interesting.
00:50:45.860 | And yeah, okay, so the interconnection between adjacent,
00:50:49.900 | so connections across service constraints
00:50:53.100 | that keep the thing stable.
00:50:55.340 | - Correct.
00:50:56.260 | - Okay, so what else?
00:50:58.300 | - And then there's this idea of doing inference.
00:51:01.100 | A neural network does not do inference on the fly.
00:51:05.500 | So an example of why this inference is important is,
00:51:09.220 | so one of the first applications
00:51:11.780 | that we showed in the paper was to crack text-based captures.
00:51:16.780 | - What are captures, by the way?
00:51:19.180 | (laughs)
00:51:20.020 | - Yeah.
00:51:20.860 | - By the way, one of the most awesome,
00:51:22.140 | like the people don't use this term anymore,
00:51:24.140 | it's human computation, I think.
00:51:25.780 | I love this term.
00:51:28.020 | The guy who created captures,
00:51:29.620 | I think came up with this term.
00:51:30.860 | - Yeah.
00:51:31.700 | - I love it.
00:51:32.540 | Anyway, what are captures?
00:51:35.580 | - So captures are those strings that you fill in
00:51:39.860 | when you're, you know,
00:51:40.820 | if you're opening a new account in Google,
00:51:43.300 | they show you a picture, you know,
00:51:45.380 | usually it used to be set of garbled letters
00:51:48.700 | that you have to kind of figure out
00:51:50.660 | what is that string of characters and type it.
00:51:53.340 | And the reason captures exist is because, you know,
00:51:57.740 | Google or Twitter do not want
00:52:00.980 | automatic creation of accounts.
00:52:02.660 | You can use a computer to create millions of accounts
00:52:06.540 | and use that for nefarious purposes.
00:52:10.740 | So you want to make sure that, to the extent possible,
00:52:13.860 | the interaction that their system is having
00:52:16.940 | is with a human.
00:52:18.020 | So it's called a human interaction proof.
00:52:20.900 | A capture is a human interaction proof.
00:52:23.540 | So this is, captures are by design,
00:52:27.340 | things that are easy for humans to solve,
00:52:29.820 | but hard for computers.
00:52:30.820 | - Hard for robots, yeah.
00:52:31.860 | - Yeah.
00:52:32.780 | So, and text-based captures was the one which is prevalent
00:52:37.780 | until around 2014,
00:52:39.780 | because at that time, text-based captures
00:52:42.260 | were hard for computers to crack.
00:52:44.740 | Even now, they are actually,
00:52:46.780 | in the sense of an arbitrary text-based capture
00:52:49.780 | will be unsolvable even now.
00:52:51.740 | But with the techniques that we have developed,
00:52:53.940 | it can be, you know, you can quickly develop a mechanism
00:52:56.380 | that solves the capture.
00:52:58.940 | - They've probably gotten a lot harder too.
00:53:00.740 | The people, they've been getting cleverer and cleverer
00:53:03.860 | generating these text captures.
00:53:05.380 | - Correct, correct.
00:53:06.300 | - So, okay, so that was one of the things you've tested it on
00:53:09.420 | is these kinds of captures in 2014, '15, that kind of stuff.
00:53:13.740 | So what, I mean, by the way, why captures?
00:53:18.500 | - Yeah, yeah.
00:53:19.740 | Even now, I would say capture
00:53:21.420 | is a very, very good challenge problem.
00:53:23.580 | If you want to understand how human perception works
00:53:27.380 | and if you want to build systems that work
00:53:30.500 | like the human brain.
00:53:32.300 | And I wouldn't say capture is a solved problem.
00:53:34.860 | We have cracked the fundamental defense of captures,
00:53:37.620 | but it is not solved in the way that humans solve it.
00:53:41.980 | So I can give an example.
00:53:43.420 | I can take a five-year-old child
00:53:46.500 | who has just learned characters
00:53:48.860 | and show them any new capture that we create.
00:53:53.220 | They will be able to solve it.
00:53:54.740 | I can show you pretty much any new capture
00:53:58.980 | from any new website.
00:54:00.620 | You'll be able to solve it
00:54:01.580 | without getting any training examples
00:54:04.060 | from that particular style of capture.
00:54:05.860 | - You're assuming I'm human, yeah.
00:54:07.180 | - Yes, yeah, that's right.
00:54:10.020 | So if you are human,
00:54:11.100 | otherwise I will be able to figure that out using this one.
00:54:15.740 | - This whole podcast is just a touring test,
00:54:19.220 | a long touring test.
00:54:20.980 | Anyway, I'm sorry.
00:54:21.820 | So yeah, so humans can figure it out
00:54:24.700 | with very few examples.
00:54:26.260 | - Or no training examples.
00:54:28.220 | No training examples from that particular style of capture.
00:54:31.180 | And so even now this is unreachable
00:54:35.980 | for the current deep learning system.
00:54:38.220 | So basically there is no,
00:54:39.860 | I don't think a system exists
00:54:41.220 | where you can basically say,
00:54:42.460 | train on whatever you want.
00:54:44.300 | And then now say,
00:54:45.740 | hey, I will show you a new capture,
00:54:47.660 | which I did not show you in the training setup.
00:54:50.940 | Will the system be able to solve it?
00:54:52.740 | Still doesn't exist.
00:54:54.820 | So that is the magic of human perception.
00:54:58.020 | And Doug Hofstadter put this very beautifully
00:55:02.180 | in one of his talks.
00:55:04.700 | The central problem in AI is what is the letter A?
00:55:10.500 | If you can build a system that reliably can detect
00:55:15.140 | all the variations of the letter A,
00:55:16.980 | you don't even need to go to the...
00:55:18.780 | - The B and the C.
00:55:20.860 | - Yeah, you don't even need to go to the B and the C
00:55:22.620 | or the strings of characters.
00:55:24.140 | And so that is the spirit at which,
00:55:27.140 | with which we tackle that problem.
00:55:28.820 | - What does he mean by that?
00:55:30.060 | I mean, is it like without training examples,
00:55:34.900 | try to figure out the fundamental elements
00:55:39.020 | that make up the letter A in all of its forms?
00:55:43.620 | - In all of its forms.
00:55:44.460 | It can be, A can be made with two humans standing,
00:55:47.060 | leaning against each other, holding the hands.
00:55:49.420 | And it can be made of leaves.
00:55:51.580 | It can be...
00:55:52.420 | - Yeah, you might have to understand everything
00:55:54.780 | about this world in order to understand the letter A.
00:55:57.100 | - Exactly.
00:55:58.100 | - So it's common sense reasoning, essentially.
00:56:00.540 | - Right.
00:56:01.380 | So to finally, to really solve,
00:56:03.700 | finally to say that you have solved capture,
00:56:07.980 | you have to solve the whole problem.
00:56:09.580 | (both laughing)
00:56:11.260 | - Yeah, okay.
00:56:12.100 | So how does this kind of the RCN architecture
00:56:15.780 | help us to get, do a better job of that kind of thing?
00:56:18.940 | - Yeah, so as I mentioned,
00:56:21.260 | one of the important things was being able to do inference,
00:56:25.100 | being able to dynamically do inference.
00:56:27.220 | - Can you clarify what you mean?
00:56:30.700 | 'Cause you said like neural networks don't do inference.
00:56:33.180 | - Yeah.
00:56:34.020 | - So what do you mean by inference in this context then?
00:56:35.940 | - So, okay, so in captures,
00:56:38.380 | what they do to confuse people
00:56:40.380 | is to make these characters crowd together.
00:56:43.460 | - Yes.
00:56:44.300 | - Okay, and when you make the characters crowd together,
00:56:46.740 | what happens is that you will now start seeing
00:56:49.340 | combinations of characters as some other new character
00:56:52.340 | or an existing character.
00:56:53.780 | So you would put an R and N together,
00:56:56.820 | it will start looking like an M.
00:56:59.140 | And so locally, there is very strong evidence
00:57:03.660 | for it being some incorrect character.
00:57:08.660 | But globally, the only explanation that fits together
00:57:12.620 | is something that is different
00:57:14.260 | from what you can find locally.
00:57:16.140 | - Yes.
00:57:16.980 | - So this is inference.
00:57:18.860 | You are basically taking local evidence
00:57:22.260 | and putting it in the global context
00:57:24.620 | and often coming to a conclusion locally,
00:57:27.780 | which is conflicting with the local information.
00:57:30.100 | - So actually, so you mean inference like
00:57:33.940 | in the way it's used when you talk about reasoning,
00:57:36.660 | for example, as opposed to like inference,
00:57:39.420 | which is with artificial neural networks,
00:57:42.380 | which is a single pass to the network.
00:57:44.020 | - Correct.
00:57:44.860 | - Okay.
00:57:45.700 | So like you're basically doing some basic forms of reasoning
00:57:48.740 | like integration of like how local things fit
00:57:52.780 | into the global picture.
00:57:54.140 | - And things like explaining away coming into this one
00:57:56.940 | because you are explaining that piece of evidence
00:58:01.420 | as something else because globally,
00:58:04.220 | that's the only thing that makes sense.
00:58:06.180 | So now you can amortize this inference by,
00:58:11.180 | you know, in a neural network,
00:58:13.260 | if you want to do this, you can brute force it.
00:58:16.260 | You can just show it all combinations of things
00:58:19.460 | that you want your reasoning to work over.
00:58:23.260 | And you can, you know, like just train the hell
00:58:26.140 | out of that neural network.
00:58:27.420 | And it will look like it is doing, you know,
00:58:30.540 | inference on the fly,
00:58:31.460 | but it is really just doing amortized inference.
00:58:35.020 | It is because you have shown it a lot of these combinations
00:58:38.620 | during training time.
00:58:40.460 | So what you want to do is be able to do dynamic inference
00:58:44.540 | rather than just being able to show all those combinations
00:58:47.420 | in the training time.
00:58:48.620 | And that's something we emphasized in the model.
00:58:51.780 | - What does it mean dynamic inference?
00:58:53.900 | Is that that has to do with the feedback thing?
00:58:56.300 | - Yes.
00:58:57.140 | - Like what is dynamic?
00:58:58.740 | I'm trying to visualize what dynamic inference
00:59:01.500 | would be in this case.
00:59:02.500 | Like, what is it doing with the input?
00:59:05.180 | It's shown the input the first time.
00:59:07.780 | - Yeah.
00:59:08.620 | - And it's like, what's changing over temporarily?
00:59:12.420 | What's the dynamics of this inference process?
00:59:14.860 | - So you can think of it as you have at the top of the model,
00:59:18.900 | the characters that you are trained on,
00:59:21.220 | they are the causes.
00:59:23.020 | You're trying to explain the pixels
00:59:25.780 | using the characters as the causes.
00:59:28.900 | The characters are the things that cause the pixels.
00:59:32.260 | - Yeah, so there's this causality thing.
00:59:34.980 | So the reason you mentioned causality, I guess,
00:59:37.780 | is because there's a temporal aspect to this whole thing.
00:59:40.860 | - In this particular case,
00:59:41.900 | the temporal aspect is not important.
00:59:43.380 | It is more like when, if I turn the character on,
00:59:47.740 | the pixels will turn on.
00:59:49.180 | Yeah, it will be after this a little bit, but yeah.
00:59:51.060 | - Okay, so it's causality in the sense
00:59:53.300 | of like a logic causality, like hence inference, okay.
00:59:57.540 | - The dynamics is that even though locally,
01:00:01.340 | it will look like, okay, this is an A.
01:00:03.860 | And locally, just when I look at just that patch of the image,
01:00:09.140 | it looks like an A.
01:00:10.660 | But when I look at it in the context of all the other causes,
01:00:14.500 | it might not, A is not something that makes sense.
01:00:17.380 | So that is something you have to kind of,
01:00:19.220 | you know, recursively figure out.
01:00:20.660 | - Yeah, so, okay, so, and this thing performed pretty well
01:00:25.340 | on the CAPTCHAs.
01:00:26.260 | - Correct.
01:00:27.100 | - And I mean, is there some kind of interesting intuition
01:00:32.100 | you can provide why it did well?
01:00:34.100 | Like, what did it look like?
01:00:36.060 | Is there visualizations that could be human interpretable
01:00:38.620 | to us humans?
01:00:39.740 | - Yes, yeah, so the good thing about the model
01:00:41.980 | is that it is extremely,
01:00:44.500 | so it is not just doing a classification, right?
01:00:46.620 | It is providing a full explanation for the scene.
01:00:50.540 | So when it operates on a scene,
01:00:55.180 | it is coming at back and saying,
01:00:57.100 | look, this is the part is the A,
01:00:59.780 | and these are the pixels that turned on,
01:01:02.140 | these are the pixels in the input that tells,
01:01:05.700 | makes me think that it is an A.
01:01:08.100 | And also these are the portions I hallucinated.
01:01:10.940 | It provides a complete explanation of that form.
01:01:15.340 | And then these are the contours, this is the interior,
01:01:19.660 | and this is in front of this other object.
01:01:22.660 | So that's the kind of explanation
01:01:25.700 | the inference network provides.
01:01:28.500 | So that is useful and interpretable.
01:01:32.140 | And then the kind of errors it makes are also,
01:01:38.460 | I don't want to read too much into it,
01:01:42.980 | but the kind of errors the network makes
01:01:45.220 | are very similar to the kinds of errors
01:01:48.180 | humans would make in a similar situation.
01:01:50.420 | - So there's something about the structure
01:01:51.940 | that feels reminiscent of the way
01:01:54.420 | humans' visual system works.
01:01:58.900 | Well, I mean, how hard-coded is this
01:02:01.900 | to the capture problem, this idea?
01:02:04.420 | - Not really hard-coded because it's the,
01:02:06.820 | the assumptions, as I mentioned, are general, right?
01:02:09.340 | It is more, and those themselves can be applied
01:02:13.300 | in many situations which are natural signals.
01:02:17.180 | So it's the foreground versus background factorization
01:02:20.700 | and the factorization of the surfaces versus the contours.
01:02:25.460 | So these are all generally applicable assumptions.
01:02:27.700 | - In all vision.
01:02:29.060 | So why capture, why attack the capture problem,
01:02:34.060 | which is quite unique in the computer vision context
01:02:36.700 | versus like the traditional benchmarks of ImageNet
01:02:40.900 | and all those kinds of image classification
01:02:43.740 | or even segmentation tasks and all that kind of stuff.
01:02:46.180 | Do you feel like that's, I mean,
01:02:48.340 | what's your thinking about those kinds of benchmarks
01:02:50.780 | in this context?
01:02:53.540 | - I mean, those benchmarks are useful
01:02:55.220 | for deep learning kind of algorithms where you,
01:02:58.620 | so the settings that deep learning works in are,
01:03:02.300 | here is my huge training set and here is my test set.
01:03:05.940 | So the training set is almost 100x, 1000x bigger
01:03:10.420 | than the test set in many cases.
01:03:14.140 | What we wanted to do was invert that.
01:03:17.500 | The training set is way smaller than the test set.
01:03:21.860 | - Yes.
01:03:22.700 | - And, you know, capture is a problem that is by definition
01:03:27.700 | hard for computers and it has these good properties
01:03:33.100 | of strong generalization, strong out of training
01:03:36.820 | distribution generalization.
01:03:38.180 | If you are interested in studying that and putting,
01:03:42.580 | having your model have that property,
01:03:44.580 | then it's a good data set to tackle.
01:03:46.820 | - So is there, have you attempted to,
01:03:49.300 | which I think, I believe there's quite a growing body
01:03:52.460 | of work on looking at MNIST and ImageNet without training.
01:03:57.460 | So like taking like the basic challenges,
01:04:01.460 | how, what tiny fraction of the training set can we take
01:04:05.760 | in order to do a reasonable job of the classification task?
01:04:10.460 | Have you explored that angle in these classic benchmarks?
01:04:15.020 | - Yes, so we did do MNIST.
01:04:16.580 | So, you know, so it's not just capture.
01:04:19.380 | So there was also versions of, multiple versions of MNIST,
01:04:24.380 | including the standard version,
01:04:27.340 | which where we inverted the problem,
01:04:28.940 | which is basically saying rather than train
01:04:30.780 | on 60,000 training data, you know,
01:04:33.700 | how quickly can you get to high level accuracy
01:04:38.500 | with very little training data?
01:04:40.060 | - Is there some performance that you remember,
01:04:42.220 | like how well, how well did it do?
01:04:45.260 | How many examples did it need?
01:04:47.420 | - Yeah, I, you know, I remember that it was,
01:04:51.260 | you know, on the order of tens or hundreds of examples
01:04:56.260 | to get into 95% accuracy.
01:04:59.580 | And it was, it was definitely better than the systems,
01:05:02.180 | other systems out there at that time.
01:05:03.940 | - At that time. - Yeah.
01:05:04.820 | - Yeah, they're really pushing it.
01:05:05.900 | I think that's a really interesting space, actually.
01:05:09.340 | I think there's an actual name for MNIST that,
01:05:14.340 | like there's different names
01:05:16.740 | for the different sizes of training sets.
01:05:19.220 | I mean, people are like attacking this problem.
01:05:21.260 | I think it's super interesting.
01:05:22.900 | It's funny how like the MNIST will probably be with us
01:05:27.060 | all the way to AGI.
01:05:28.660 | - Yes. (laughs)
01:05:29.900 | - It's a data set that just sticks by.
01:05:31.700 | It is, it's a clean, simple data set
01:05:35.860 | to study the fundamentals of learning
01:05:38.140 | with just like CAPTCHAs, it's interesting.
01:05:40.460 | Not enough people, I don't know, maybe you can correct me,
01:05:43.900 | but I feel like CAPTCHAs don't show up as often in papers
01:05:47.140 | as they probably should.
01:05:48.380 | - That's correct, yeah.
01:05:49.540 | Because, you know, usually these things have a momentum,
01:05:53.700 | you know, once something gets established
01:05:57.660 | as a standard benchmark, there is a dynamics
01:06:02.340 | of how graduate students operate
01:06:04.660 | and how academic system works
01:06:07.420 | that pushes people to track that benchmark.
01:06:10.860 | - Yeah, to focus.
01:06:12.140 | - Yeah.
01:06:12.980 | - Nobody wants to think outside the box, okay.
01:06:15.540 | - Yes.
01:06:16.780 | - Okay, so good performance on the CAPTCHAs.
01:06:20.700 | What else is there interesting on the RCN side
01:06:23.940 | before we talk about the cortical microscope?
01:06:25.700 | - Yeah, so the same model,
01:06:27.780 | so the important part of the model
01:06:30.620 | was that it trains very quickly
01:06:32.260 | with very little training data.
01:06:33.860 | And it's quite robust to out-of-distribution perturbations.
01:06:38.860 | And we are using that very fruitfully
01:06:44.300 | and advocatiously in many of the robotics tasks
01:06:47.260 | we are solving.
01:06:48.100 | - You're solving that.
01:06:48.940 | Well, let me ask you this kind of touchy question.
01:06:51.940 | I have to, I've spoken with your friend, colleague,
01:06:56.100 | Jeff Hawkins, too.
01:06:56.940 | I mean, I have to kind of ask, there is a bit of,
01:07:02.100 | whenever you have brain-inspired stuff
01:07:04.740 | and you make big claims, big sexy claims,
01:07:08.320 | there's critics, I mean, machine learning subreddit.
01:07:14.660 | Don't get me started on those people.
01:07:16.500 | I mean, criticism is good, but they're a bit over the top.
01:07:21.460 | There is quite a bit of sort of skepticism and criticism.
01:07:26.660 | Is this work really as good as it promises to be?
01:07:30.540 | Do you have thoughts on that kind of skepticism?
01:07:34.860 | Do you have comments on the kind of criticism
01:07:36.780 | we might've received about, is this approach legit?
01:07:41.460 | Is this a promising approach?
01:07:44.620 | Or at least as promising as it seems to be advertised as?
01:07:49.620 | - Yeah, I can comment on it.
01:07:51.700 | So our Arsene paper is published in Science,
01:07:55.300 | which I would argue is a very high quality journal,
01:07:58.420 | very hard to publish in.
01:08:00.260 | And usually it is indicative of the quality of the work.
01:08:04.540 | And I am very, very certain that the ideas
01:08:09.540 | that we brought together in that paper
01:08:13.420 | in terms of the importance of feedback connections,
01:08:16.580 | recursive inference, lateral connections,
01:08:19.260 | coming to best explanation of the scene
01:08:21.940 | as the problem to solve,
01:08:23.500 | trying to solve recognition, segmentation, all jointly
01:08:28.780 | in a way that is compatible with higher level cognition,
01:08:31.420 | top-down attention, all those ideas
01:08:33.100 | that we brought together into something coherent
01:08:35.460 | and workable in the world and tackling a challenging problem,
01:08:39.860 | I think that will stay and that contribution I stand by.
01:08:44.300 | Now, I can tell you a story which is funny
01:08:48.340 | in the context of this.
01:08:49.980 | So if you read the abstract of the paper
01:08:51.980 | and the argument we are putting in,
01:08:55.140 | look, current deep learning systems
01:08:56.780 | take a lot of training data.
01:08:58.780 | They don't use these insights.
01:09:00.700 | And here is our new model,
01:09:02.540 | which is not a deep neural network, it's a graphical model.
01:09:04.700 | It does inference.
01:09:05.540 | This is what the paper is, right?
01:09:07.380 | Now, once the paper was accepted and everything,
01:09:10.020 | it went to the press department in Science,
01:09:13.900 | AAAS Science Office.
01:09:15.180 | We didn't do any press release when it was published.
01:09:17.340 | It went to the press department.
01:09:19.060 | What was the press release that they wrote up?
01:09:21.900 | A new deep learning model. (laughs)
01:09:24.900 | - Solves CAPTCHAs.
01:09:26.100 | - Solves CAPTCHAs.
01:09:27.260 | And so you can see what was being hyped in that thing.
01:09:32.260 | So it's like there is a dynamic in the community.
01:09:37.780 | That especially happens when there are lots of new people
01:09:43.940 | coming into the field and they get attracted to one thing.
01:09:46.860 | And some people are trying to think different
01:09:49.820 | compared to that.
01:09:50.660 | So there is some, I think skepticism in science
01:09:53.820 | is important and it is very much required.
01:09:58.460 | But it's also, it's not skepticism usually,
01:10:01.540 | it's mostly bandwagon effect that is happening
01:10:04.060 | rather than-
01:10:04.900 | - But that's not even that.
01:10:06.980 | I mean, I'll tell you what they react to,
01:10:09.060 | which is like, I'm sensitive to as well.
01:10:12.220 | If you look at just companies, OpenAI, DeepMind,
01:10:15.220 | Vicarious, I mean, there's a little bit of a race
01:10:23.260 | to the top and hype, right?
01:10:25.500 | It's like, it doesn't pay off to be humble.
01:10:29.740 | (laughs)
01:10:31.940 | So like, and the press is just irresponsible often.
01:10:36.940 | They just, I mean, don't get me started
01:10:38.900 | on the state of journalism today.
01:10:41.380 | Like, it seems like the people who write articles
01:10:43.580 | about these things, they literally have not even spent
01:10:47.620 | an hour on the Wikipedia article
01:10:49.500 | about what is neural networks.
01:10:51.780 | They haven't invested just even the language to laziness.
01:10:56.780 | It's like, robots beat humans.
01:11:01.900 | Like, they write this kind of stuff that just,
01:11:06.020 | and then of course the researchers are quite sensitive
01:11:09.300 | to that because it gets a lot of attention.
01:11:12.260 | They're like, why did this word get so much attention?
01:11:15.020 | That's over the top and people get really sensitive.
01:11:19.260 | - The same kind of criticism with,
01:11:21.540 | OpenAI did work with Rubik's Cube with the robot
01:11:24.540 | that people criticized.
01:11:26.700 | Same with GPT-2 and 3, they criticize.
01:11:30.220 | Same thing with DeepMinds with AlphaZero.
01:11:33.700 | I mean, yeah, I'm sensitive to it, but,
01:11:37.340 | and of course with your work, you mentioned deep learning,
01:11:40.260 | but there's something super sexy to the public
01:11:43.460 | about brain-inspired.
01:11:45.540 | I mean, that immediately grabs people's imagination.
01:11:48.440 | Not even like neural networks, but like really brain-inspired.
01:11:53.440 | - Got it.
01:11:54.300 | - Like brain-like neural networks.
01:11:57.500 | That seems really compelling to people and to me as well,
01:12:01.180 | to the world as a narrative.
01:12:03.500 | And so people hook up, hook onto that,
01:12:07.580 | and sometimes the skepticism engine turns on
01:12:12.300 | in the research community and they're skeptical.
01:12:15.040 | But I think putting aside the ideas
01:12:19.060 | of the actual performance on CAPTCHAs
01:12:20.900 | or performance on any dataset,
01:12:22.940 | I mean, to me, all these datasets are useless anyway.
01:12:26.820 | It's nice to have them, but in the grand scheme of things,
01:12:30.220 | they're silly toy examples.
01:12:32.180 | The point is, is there intuition about the ideas,
01:12:37.120 | just like you mentioned,
01:12:38.540 | bringing the ideas together in a unique way?
01:12:41.540 | Is there something there?
01:12:42.780 | Is there some value there?
01:12:44.180 | And is it gonna stand the test of time?
01:12:45.920 | - Yes.
01:12:46.760 | - And that's the hope.
01:12:47.580 | - Yes.
01:12:48.420 | - That's the hope.
01:12:49.320 | - My confidence there is very high.
01:12:51.680 | I don't treat brain-inspired as a marketing term.
01:12:55.040 | I am looking into the details of biology
01:13:00.080 | and puzzling over those things,
01:13:03.320 | and I am grappling with those things.
01:13:05.840 | And so it is not a marketing term at all.
01:13:08.920 | You can use it as a marketing term,
01:13:10.440 | and people often use it.
01:13:12.240 | And you can get combined with them.
01:13:14.500 | And when people don't understand
01:13:16.700 | how we are approaching the problem,
01:13:18.180 | it is easy to be misunderstood
01:13:21.320 | and think of it as purely marketing,
01:13:24.300 | but that's not the way we are.
01:13:26.220 | - So you really, I mean, as a scientist,
01:13:30.100 | you believe that if we kinda just stick
01:13:33.240 | to really understanding the brain,
01:13:35.380 | that's the right, like you should constantly meditate
01:13:39.740 | on the how does the brain do this?
01:13:42.580 | 'Cause that's going to be really helpful
01:13:44.240 | for engineering intelligence systems.
01:13:46.560 | - Yes, you need to, so I think it's one input,
01:13:49.800 | and it is helpful, but you should know
01:13:53.820 | when to deviate from it too.
01:13:56.280 | So an example is convolutional neural networks, right?
01:13:59.680 | Convolution is not an operation brain implements.
01:14:04.280 | The visual cortex is not convolutional.
01:14:07.400 | Visual cortex has local receptive fields,
01:14:10.180 | local connectivity, but there is no translation
01:14:15.180 | in invariance in the network weights in the visual cortex.
01:14:21.180 | That is a computational trick,
01:14:25.720 | which is a very good engineering trick
01:14:27.200 | that we use for sharing the training
01:14:29.780 | between the different nodes.
01:14:31.780 | So, and that trick will be with us for some time.
01:14:35.580 | It will go away when we have robots
01:14:38.180 | with eyes and heads that move.
01:14:43.460 | And so then that trick will go away.
01:14:45.460 | It will not be useful at that time.
01:14:48.740 | - So the brain doesn't have translational invariance.
01:14:53.060 | It has the focal point, like it has a thing it focuses on.
01:14:56.020 | - Correct, it has a fovea, and because of the fovea,
01:14:59.480 | the receptive fields are not like the copying of the weights,
01:15:04.140 | like the weights in the center are very different
01:15:06.660 | from the weights in the periphery.
01:15:07.820 | - Yes, at the periphery.
01:15:08.780 | I mean, I did this, actually wrote a paper
01:15:12.540 | and just gotten a chance to really study peripheral vision,
01:15:16.820 | which is a fascinating thing.
01:15:19.060 | Very under understood thing of what the,
01:15:24.060 | at every level the brain does with the periphery.
01:15:28.140 | It does some funky stuff.
01:15:29.740 | So it's another kind of trick than convolutional.
01:15:34.740 | Like it does, it's, you know, convolutional,
01:15:39.900 | convolution in neural networks is a trick for efficiency,
01:15:43.700 | is an efficiency trick.
01:15:45.180 | And the brain does a whole nother kind of thing, I guess.
01:15:47.540 | - Correct, correct.
01:15:48.420 | So you need to understand the principles
01:15:51.180 | of processing so that you can still apply engineering tricks
01:15:55.740 | where you want it to.
01:15:56.580 | You don't want to be slavishly mimicking
01:15:58.500 | all the things of the brain.
01:16:00.580 | And so, yeah, so it should be one input
01:16:02.580 | and I think it is extremely helpful,
01:16:05.100 | but it should be the point of really understanding
01:16:08.420 | so that you know when to deviate from it.
01:16:10.860 | - So, okay, that's really cool.
01:16:13.260 | That's work from a few years ago.
01:16:15.860 | So you did work in Numenta with Jeff Hawkins
01:16:19.260 | with hierarchical temporal memory.
01:16:23.340 | How is your just, if you could just give a brief history,
01:16:27.340 | how is your view of the way the models of the brain changed
01:16:31.820 | over the past few years leading up to now?
01:16:35.660 | Is there some interesting aspects
01:16:37.360 | where there was an adjustment
01:16:40.060 | to your understanding of the brain
01:16:41.660 | or is it all just building on top of each other?
01:16:44.020 | - In terms of the higher level ideas,
01:16:46.140 | especially the ones Jeff wrote about in the book,
01:16:49.140 | if you blur out, right, you know.
01:16:51.140 | - Yeah, on intelligence.
01:16:52.380 | - Right, on intelligence.
01:16:53.380 | If you blur out the details and if you just zoom out
01:16:56.580 | and at the higher level idea,
01:16:57.980 | things are, I would say, consistent
01:17:00.820 | with what he wrote about,
01:17:01.860 | but many things will be consistent with that
01:17:04.340 | because it's a blur, you know, when you,
01:17:07.100 | deep learning systems are also, you know,
01:17:09.260 | multi-level, hierarchical, all of those things, right?
01:17:12.020 | But in terms of the detail, a lot of things are different
01:17:18.620 | and those details matter a lot.
01:17:22.300 | So one point of difference I had with Jeff
01:17:26.860 | was how to approach, you know,
01:17:29.900 | how much of biological plausibility and realism
01:17:33.740 | do you want in the learning algorithms?
01:17:35.740 | So when I was there, this was, you know,
01:17:41.100 | almost 10 years ago now, so--
01:17:42.500 | - Yeah, flies when you're having fun.
01:17:44.540 | - I don't know what Jeff thinks now,
01:17:46.540 | but 10 years ago, the difference was that
01:17:49.940 | I did not want to be so constrained on saying,
01:17:54.820 | my learning algorithms need to be biologically plausible
01:17:58.380 | based on some filter of biological plausibility
01:18:02.020 | available at that time.
01:18:03.380 | To me, that is a dangerous cut to make
01:18:05.940 | because we are, you know,
01:18:08.060 | discovering more and more things about the brain
01:18:09.620 | all the time.
01:18:10.580 | New biophysical mechanisms, new channels
01:18:13.740 | are being discovered all the time.
01:18:15.500 | So I don't want to upfront kill off a learning algorithm
01:18:20.180 | just because we don't really understand
01:18:22.620 | the full biophysics or whatever of how the brain learns.
01:18:27.620 | - Exactly, exactly.
01:18:28.900 | - But let me ask, and I'm sorry to interrupt,
01:18:30.900 | like, what's your sense, what's our best understanding
01:18:34.740 | of how the brain learns?
01:18:36.620 | - So things like back propagation, credit assignment,
01:18:40.780 | so many of these algorithms have,
01:18:43.340 | learning algorithms have things in common, right?
01:18:45.540 | It is, back propagation is one way of credit assignment.
01:18:49.380 | There is another algorithm called expectation maximization,
01:18:52.660 | which is, you know, another weight adjustment algorithm.
01:18:56.100 | - But is it your sense the brain does something like this?
01:18:58.980 | - Has to, there is no way around it
01:19:01.380 | in the sense of saying that you do have to adjust
01:19:04.380 | the connections.
01:19:06.340 | - So, and you're saying credit assignment,
01:19:08.020 | you have to reward the connections that were useful
01:19:10.180 | in making a correct prediction and not,
01:19:12.620 | yeah, I guess, but yeah, it doesn't have to be differentiable.
01:19:17.620 | - Yeah, it doesn't have to be differentiable.
01:19:19.060 | - Yeah, but you have to have a, you know,
01:19:22.260 | you have a model that you start with,
01:19:24.380 | you have data comes in, and you have to have a way
01:19:27.700 | of adjusting the model such that it better fits the data.
01:19:32.220 | So that is all of learning, right?
01:19:34.660 | And some of them can be using backprop to do that.
01:19:38.020 | Some of it can be using, you know,
01:19:40.540 | very local graph changes to do that.
01:19:45.500 | That can, you know, many of these learning algorithms
01:19:48.780 | have similar update properties locally
01:19:52.820 | in terms of what the neurons need to do locally.
01:19:57.340 | - I wonder if small differences in learning algorithms
01:19:59.820 | can have huge differences in the actual effect.
01:20:02.380 | So the dynamics of, I mean, sort of the reverse,
01:20:07.220 | like spiking, like if credit assignment is like a lightning
01:20:13.100 | versus like a rainstorm or something,
01:20:15.820 | like whether there's like a looping local type of situation
01:20:20.820 | with the credit assignment,
01:20:25.580 | whether there is like regularization,
01:20:29.460 | like how it injects robustness into the whole thing,
01:20:34.460 | like whether it's chemical or electrical or mechanical,
01:20:40.660 | all those kinds of things.
01:20:42.620 | - Yes.
01:20:43.460 | - I feel like it, that, yeah.
01:20:47.100 | I feel like those differences could be essential, right?
01:20:49.540 | - It could be.
01:20:50.380 | It's just that you don't know enough to,
01:20:53.980 | on the learning side, you don't know enough to say
01:20:58.460 | that is definitely not the way the brain does it.
01:21:01.180 | - Got it.
01:21:02.020 | So you don't want to be stuck to it.
01:21:02.980 | - Right.
01:21:03.820 | - So that, yeah.
01:21:04.660 | So you've been open-minded on that side of things.
01:21:07.060 | - Correct.
01:21:07.900 | On the inference side, on the recognition side,
01:21:09.220 | I am much more amenable to being constrained
01:21:12.420 | because it's much easier to do experiments
01:21:14.900 | because it's like, okay, here's the stimulus.
01:21:17.620 | How many steps did it get to take the answer?
01:21:19.780 | I can trace it back.
01:21:21.260 | I can understand the speed of that computation, et cetera,
01:21:24.860 | much more readily on the inference side.
01:21:27.900 | - Got it.
01:21:28.740 | And then you can't do good experiments on the learning side.
01:21:30.860 | - Correct.
01:21:31.700 | - So let's go right into cortical microcircuits right back.
01:21:37.500 | So what are these ideas
01:21:40.500 | beyond recursive cortical network that you're looking at now?
01:21:45.260 | - So we have made a pass through multiple of the steps
01:21:49.580 | that as I mentioned earlier,
01:21:52.700 | we were looking at perception from the angle of cognition.
01:21:56.340 | It was not just perception for perception's sake.
01:21:58.860 | How do you connect it to cognition?
01:22:01.620 | How do you learn concepts?
01:22:03.020 | And how do you learn abstract reasoning?
01:22:05.820 | Similar to some of the things Francois talked about.
01:22:09.980 | So we have taken one pass through it,
01:22:14.980 | basically saying,
01:22:16.420 | what is the basic cognitive architecture
01:22:19.540 | that you need to have,
01:22:21.020 | which has a perceptual system,
01:22:23.060 | which has a system that learns dynamics of the world,
01:22:25.980 | and then has something like a routine,
01:22:29.900 | program learning system on top of it to learn concepts.
01:22:33.220 | So we have built one, the version 0.1 of that system.
01:22:38.300 | This was another science robotics paper.
01:22:41.500 | It's the title of that paper was,
01:22:44.340 | something like cognitive programs.
01:22:45.860 | How do you build cognitive programs?
01:22:49.020 | - And the application there was on manipulation,
01:22:52.580 | robotic manipulation?
01:22:53.420 | - It was, so think of it like this.
01:22:56.100 | Suppose you wanted to tell a new person that you met,
01:23:01.100 | you don't know the language that person uses.
01:23:04.980 | You want to communicate to that person
01:23:07.340 | to achieve some task, right?
01:23:09.100 | So I want to say,
01:23:10.220 | hey, you need to pick up all the red cups
01:23:14.300 | from the kitchen counter and put it here, right?
01:23:17.300 | How do you communicate that, right?
01:23:19.300 | You can show pictures.
01:23:21.060 | You can basically say, look, this is the starting state.
01:23:23.580 | The things are here, this is the ending state.
01:23:27.060 | And what does the person need to understand from that?
01:23:30.020 | The person need to understand
01:23:31.540 | what conceptually happened in those pictures
01:23:33.700 | from the input to the output, right?
01:23:35.940 | So we are looking at pre-verbal conceptual understanding.
01:23:40.940 | Without language, how do you have a set of concepts
01:23:45.540 | that you can manipulate in your head?
01:23:48.300 | And from a set of images of input and output,
01:23:52.340 | can you infer what is happening in those images?
01:23:55.980 | - Got it, with concepts that are pre-language, okay.
01:23:59.140 | So what's it mean for a concept to be pre-language?
01:24:02.500 | Like, why is language so important here?
01:24:07.500 | - So I want to make a distinction between concepts
01:24:13.660 | that are just learned from text,
01:24:16.700 | by just feeding brute force text.
01:24:19.860 | You can start extracting things like, okay,
01:24:24.580 | cow is likely to be on grass.
01:24:27.420 | So those kinds of things you can extract purely from text.
01:24:32.660 | But that's kind of a simple association thing
01:24:35.860 | rather than a concept as an abstraction of something
01:24:39.060 | that happens in the real world, in a grounded way,
01:24:43.140 | that I can simulate it in my mind
01:24:45.780 | and connect it back to the real world.
01:24:48.620 | - And you think kind of the visual world,
01:24:52.180 | concepts in the visual world are somehow lower level
01:24:56.620 | than just the language?
01:24:58.940 | - The lower level kind of makes it feel like,
01:25:00.980 | okay, that's unimportant.
01:25:02.660 | Like, it's more like, I would say the concepts
01:25:07.660 | in the visual and motor system
01:25:12.460 | and the concept learning system,
01:25:15.580 | which if you cut off the language part,
01:25:17.660 | just what we learn by interacting with the world
01:25:20.460 | and abstractions from that,
01:25:21.940 | that is a prerequisite for any real language understanding.
01:25:26.580 | - So you disagree with Chomsky,
01:25:29.260 | 'cause he says language is at the bottom of everything.
01:25:32.140 | - No, yeah, I disagree with Chomsky completely
01:25:34.860 | on so many levels, from universal grammar to, yeah.
01:25:39.860 | - So that was a paper in Science B
01:25:41.660 | on the recursive cortical network.
01:25:44.020 | What other interesting problems are there,
01:25:47.180 | the open problems in brain-inspired approaches
01:25:51.060 | that you're thinking about?
01:25:52.420 | - I mean, everything is open, right?
01:25:53.740 | Like, no problem is solved, solved, right?
01:25:58.460 | First, I think of perception as kind of
01:26:01.420 | the first thing that you have to build,
01:26:05.620 | but the last thing that you will be actually solved.
01:26:08.620 | Because if you do not build perception system
01:26:13.380 | in the right way,
01:26:14.380 | you cannot build concept system in the right way.
01:26:17.340 | So you have to build a perception system,
01:26:19.620 | however wrong that might be,
01:26:21.180 | you have to still build that and learn concepts from there
01:26:24.060 | and then keep iterating.
01:26:26.460 | And finally, perception will get solved fully
01:26:29.740 | when perception, cognition, language,
01:26:31.500 | all those things work together finally.
01:26:34.020 | - So what, and that, so great,
01:26:37.140 | we've talked a lot about perception,
01:26:38.540 | but then maybe on the concept side and like common sense
01:26:42.340 | or just general reasoning side,
01:26:44.660 | is there some intuition you can draw from the brain
01:26:48.220 | about how we can do that?
01:26:50.780 | - So I have this classic example I give.
01:26:55.540 | So suppose I give you a few sentences
01:26:59.060 | and then ask you a question following that sentence.
01:27:01.260 | This is a natural language processing problem, right?
01:27:04.460 | So here it goes.
01:27:06.100 | I'm telling you, Sally pounded a nail on the ceiling.
01:27:10.660 | Okay, that's a sentence.
01:27:13.340 | Now I'm asking you a question,
01:27:14.700 | was the nail horizontal or vertical?
01:27:16.460 | - Vertical.
01:27:18.540 | - Okay, how did you answer that?
01:27:20.140 | - Well, I imagined Sally,
01:27:24.660 | it was kind of hard to imagine what the hell she was doing,
01:27:26.660 | but I imagined a visual of the whole situation.
01:27:31.660 | - Exactly, exactly.
01:27:34.900 | So here, I posed a question in natural language.
01:27:39.060 | The answer to that question was,
01:27:40.820 | you got the answer from actually simulating the scene.
01:27:45.060 | Now I can go more and more detail about,
01:27:47.100 | okay, was Sally standing on something while doing this?
01:27:50.380 | Could she have been standing on a light bulb to do this?
01:27:54.620 | I could ask more and more questions about this
01:27:57.580 | and I can ask, make you simulate the scene
01:28:00.420 | in more and more detail, right?
01:28:02.380 | Where is all that knowledge that you're accessing stored?
01:28:06.700 | It is not in your language system.
01:28:08.460 | It was not just by reading text you got that knowledge.
01:28:12.740 | It is stored from the everyday experiences
01:28:15.420 | that you have had from,
01:28:17.500 | and by the age of five, you have pretty much all of this,
01:28:21.700 | and it is stored in your visual system.
01:28:24.460 | It is stored in your motor system in a way
01:28:27.140 | such that it can be accessed through language.
01:28:29.820 | - Got it.
01:28:31.660 | I mean, right.
01:28:32.500 | So here, the language is just,
01:28:34.300 | almost serves as the query into the whole visual cortex
01:28:36.780 | and that does the whole feedback thing.
01:28:38.620 | But I mean, is all reasoning kind of connected
01:28:42.660 | to the perception system in some way?
01:28:45.980 | - You can do a lot of it.
01:28:47.460 | You can still do a lot of it by quick associations
01:28:51.540 | without having to go into the depth.
01:28:53.860 | And by the time you will be right, right?
01:28:55.900 | You can just do quick associations,
01:28:57.500 | but I can easily create tricky situations for you
01:29:00.220 | where that quick associations is wrong
01:29:02.340 | and you have to actually run the simulation.
01:29:04.940 | - So the figuring out how these concepts connect,
01:29:08.340 | do I have a good idea of how to do that?
01:29:11.980 | - That's exactly what-
01:29:13.300 | - That's the-
01:29:14.140 | - One of the problems that we are working on.
01:29:15.700 | And the way we are approaching that is basically saying,
01:29:20.100 | okay, you need to,
01:29:21.540 | so the takeaway is that language is simulation control
01:29:26.540 | and your perceptual plus motor system
01:29:31.380 | is building a simulation of the world.
01:29:34.140 | And so that's basically the way we are approaching it.
01:29:37.820 | And the first thing that we built
01:29:39.380 | was a controllable perceptual system.
01:29:42.220 | And we built a schema networks,
01:29:43.860 | which was a controllable dynamic system.
01:29:46.260 | Then we built a concept learning system
01:29:48.660 | that puts all these things together into programs,
01:29:52.260 | as abstractions that you can run and simulate.
01:29:55.420 | And now we are taking the step of connecting it to language.
01:29:58.140 | And it will be very simple examples initially.
01:30:02.660 | It will not be the GPT-3 like examples,
01:30:05.060 | but it will be grounded simulation-based language.
01:30:08.620 | - And for like the querying would be
01:30:12.180 | like question answering kind of thing?
01:30:13.940 | - Correct, correct.
01:30:14.900 | And it will be in some simple world initially
01:30:17.940 | on, you know, but it will be about,
01:30:20.580 | okay, can the system connect the language
01:30:22.940 | and ground it in the right way
01:30:25.300 | and run the right simulations to come up with the answer.
01:30:27.820 | - And the goal is to try to do things that,
01:30:29.700 | for example, GPT-3 couldn't do.
01:30:31.820 | - Correct.
01:30:33.180 | - Speaking of which,
01:30:35.020 | if we could talk about GPT-3 a little bit,
01:30:39.060 | I think it's an interesting thought provoking
01:30:43.740 | set of ideas that OpenAI is pushing forward.
01:30:46.140 | I think it's good for us to talk about the limits
01:30:49.220 | and the possibilities in neural network.
01:30:50.900 | So in general, what are your thoughts
01:30:52.860 | about this recently released very large
01:30:56.700 | 175 billion parameter language model?
01:30:59.980 | - So I haven't directly evaluated it yet.
01:31:03.340 | From what I have seen on Twitter
01:31:05.220 | and other people evaluating it,
01:31:07.020 | it looks very intriguing.
01:31:08.140 | You know, I am very intrigued
01:31:09.540 | by some of the properties it is displaying.
01:31:12.140 | And of course the text generation part of that
01:31:16.380 | was already evident in GPT-2,
01:31:19.260 | you know, that it can generate coherent text
01:31:21.260 | over long distances.
01:31:24.100 | But of course the weaknesses are also pretty visible
01:31:28.860 | in saying that, okay,
01:31:29.820 | it is not really carrying a world state around.
01:31:32.700 | And, you know, sometimes you get sentences like,
01:31:36.180 | I went up the hill to reach the valley or the thing.
01:31:39.380 | You know, some completely incompatible statements.
01:31:43.260 | Or when you're traveling from one place to the other,
01:31:46.260 | it doesn't take into account the time of travel,
01:31:48.020 | things like that.
01:31:48.860 | So those things I think will happen less in GPT-3
01:31:52.780 | because it is trained on even more data.
01:31:55.100 | And so, and it can do even more longer distance coherence.
01:32:00.100 | But it will still have the fundamental limitations
01:32:04.740 | that it doesn't have a world model
01:32:07.780 | and it can't run simulations in its head
01:32:09.780 | to find whether something is true in the world or not.
01:32:13.420 | - Do you think within,
01:32:15.340 | so it's taking a huge amount of text from the internet
01:32:17.900 | and forming a compressed representation.
01:32:20.580 | Do you think in that could emerge
01:32:23.860 | something that's an approximation of a world model,
01:32:27.780 | which essentially could be used for reasoning?
01:32:30.180 | I mean, it's a,
01:32:31.020 | I'm not talking about GPT-3,
01:32:34.260 | I'm talking about GPT-4, 5, and GPT-10.
01:32:37.580 | - Yeah, I mean, they will look more impressive than GPT-3.
01:32:40.780 | So you can, if you take that to the extreme,
01:32:42.980 | then a Markov chain of just first order,
01:32:47.140 | and if you go to,
01:32:49.180 | I'm taking it the other extreme.
01:32:52.020 | If you read Shannon's book, right?
01:32:55.340 | He has a model of English text,
01:32:58.060 | which is based on first order Markov chains,
01:33:00.500 | second order Markov chains, third order Markov chains,
01:33:02.380 | and saying that, okay, third order Markov chains
01:33:04.260 | look better than first order Markov chains, right?
01:33:07.660 | So does that mean a first order Markov chain
01:33:10.540 | has a model of the world?
01:33:12.820 | Yes, it does.
01:33:14.300 | So yes, in that level,
01:33:16.780 | when you go higher order models
01:33:19.260 | or more sophisticated structure in the model,
01:33:22.700 | like the transformer networks have,
01:33:24.260 | yes, they have a model of the text world,
01:33:26.860 | but that is not a model of the world.
01:33:31.980 | It's a model of the text world,
01:33:33.460 | and it will have interesting properties
01:33:37.620 | and it will be useful,
01:33:39.060 | but just scaling it up is not going to give us AGI
01:33:44.060 | or natural language understanding or meaning.
01:33:48.300 | - The question is whether being forced
01:33:53.500 | to compress a very large amount of text
01:33:56.900 | forces you to construct things that are very much like,
01:34:03.180 | 'cause the ideas of concepts and meaning is a spectrum.
01:34:07.180 | So in order to form that kind of compression,
01:34:12.780 | maybe it will be forced to figure out abstractions
01:34:18.900 | which look awfully a lot like the kind of things
01:34:24.340 | that we think about as concepts,
01:34:27.380 | as world models, as common sense.
01:34:29.660 | Is that possible?
01:34:31.180 | - No, I don't think it is possible
01:34:32.540 | because the information is not there.
01:34:34.460 | - The information is there behind the text, right?
01:34:38.740 | - No, unless somebody has written down all the details
01:34:41.740 | about how everything works in the world
01:34:44.580 | to the absurd amounts like,
01:34:46.660 | okay, it is easier to walk forward than backward,
01:34:50.460 | that you have to open the door to go out of the thing,
01:34:53.100 | doctors wear underwear,
01:34:55.380 | unless all these things somebody has written down somewhere
01:34:57.740 | or somehow the program found it to be useful
01:35:00.380 | for compression from some other text,
01:35:02.580 | the information is not there.
01:35:05.380 | - That's an argument that like text is a lot lower fidelity
01:35:09.020 | than the experience of our physical world.
01:35:13.180 | - Correct, correct.
01:35:14.300 | Picture is worth a thousand words,
01:35:15.860 | like that kind of thing.
01:35:17.580 | - Well, in this case, pictures aren't really,
01:35:19.780 | so the richest aspect of the physical world
01:35:23.740 | isn't even just pictures,
01:35:25.180 | it's the interactivity with the world.
01:35:28.460 | - Exactly.
01:35:29.300 | - It's being able to interact.
01:35:33.900 | It's almost like,
01:35:35.020 | it's almost like if you could interact,
01:35:39.620 | so I disagree, well, maybe I agree with you
01:35:42.500 | that pictures worth a thousand words,
01:35:43.940 | but a thousand--
01:35:45.940 | - It's still, yeah, you could say,
01:35:47.300 | you could capture it with a GPT-X.
01:35:49.940 | - So I wonder if there's some interactive element
01:35:52.140 | where a system could live in text world
01:35:54.300 | where it could be part of the chat,
01:35:57.220 | be part of talking to people.
01:36:00.460 | It's interesting, I mean, fundamentally,
01:36:03.140 | so you're making a statement about the limitation of text.
01:36:07.620 | Okay, so let's say we have a text corpus
01:36:11.500 | that includes basically every experience
01:36:16.500 | we could possibly have.
01:36:18.220 | I mean, just a very large corpus of text
01:36:20.260 | and also interactive components.
01:36:23.300 | I guess the question is whether
01:36:24.660 | the neural network architecture,
01:36:26.220 | these very simple transformers,
01:36:28.700 | but if they had like hundreds of trillions
01:36:32.380 | or whatever comes after trillion parameters,
01:36:36.420 | whether that could store the information needed,
01:36:41.420 | that's architecturally.
01:36:43.940 | Do you have thoughts about the limitation
01:36:46.180 | on that side of things with neural networks?
01:36:49.300 | - I mean, so transformer is still
01:36:51.900 | a feed-forward neural network.
01:36:54.420 | It has a very interesting architecture
01:36:57.140 | which is good for text modeling
01:36:59.060 | and probably some aspects of video modeling,
01:37:01.660 | but it is still a feed-forward architecture.
01:37:03.980 | - You believe in the feedback mechanism, recursion.
01:37:07.020 | - Oh, and also causality,
01:37:10.460 | being able to do counterfactual reasoning,
01:37:12.860 | being able to do interventions,
01:37:15.020 | which is actions in the world.
01:37:18.900 | So all those things require different kinds
01:37:22.020 | of models to be built.
01:37:24.180 | I don't think transformers captures that family.
01:37:28.900 | It is very good at statistical modeling of text
01:37:32.420 | and it will become better and better with more data,
01:37:36.540 | bigger models, but that is only going to get so far.
01:37:40.740 | Finally, when you, so I had this joke on Twitter
01:37:44.780 | saying that, "Hey, this is a model that has read
01:37:47.900 | "all of quantum mechanics and theory of relativity
01:37:52.100 | "and we are asking it to do text completion
01:37:54.260 | "or we are asking it to solve simple puzzles."
01:37:57.420 | That's, when you have AGI,
01:38:00.980 | that's not what you ask the system to do.
01:38:02.460 | If it does, we'll ask the system to do experiments,
01:38:05.660 | what should, and come up with hypothesis
01:38:09.380 | and revise the hypothesis based on evidence
01:38:11.820 | from experiments, all those things, right?
01:38:13.780 | Those are the things that we want the system to do
01:38:15.460 | when we have AGI, not solve simple puzzles.
01:38:20.100 | - Like impressive demo, somebody generating a red button
01:38:23.340 | in HTML.
01:38:24.420 | - Which are all useful,
01:38:27.220 | like there's no dissing the usefulness of it.
01:38:30.060 | - So I get, by the way, I'm playing a little bit
01:38:32.620 | of a devil's advocate, so calm down internet.
01:38:36.460 | So I just, I'm curious almost,
01:38:40.920 | in which ways will a dumb,
01:38:44.700 | but large neural network will surprise us?
01:38:48.260 | - Yeah.
01:38:49.100 | I'm, it's kind of your,
01:38:51.020 | I completely agree with your intuition,
01:38:53.480 | it's just that I don't want to dogmatically,
01:38:57.540 | like 100% put all the chips there.
01:39:01.420 | We've been surprised so much,
01:39:03.580 | even the current GPT-2 and 3 are so surprising.
01:39:08.320 | - Yeah.
01:39:09.900 | - The self-play mechanisms of alpha zero
01:39:13.300 | are really surprising.
01:39:15.140 | And I, reinforcement, the fact that reinforcement learning
01:39:19.860 | works at all to me is really surprising.
01:39:22.060 | The fact that neural networks work at all
01:39:23.740 | is quite surprising, given how nonlinear the space is,
01:39:27.820 | the fact that it's able to find local minima
01:39:30.860 | that are at all reasonable, it's very surprising.
01:39:33.500 | So it's, I wonder sometimes
01:39:37.100 | whether us humans just want it to not,
01:39:44.300 | for AGI not to be such a dumb thing.
01:39:46.960 | (laughing)
01:39:48.020 | So I just,
01:39:48.860 | 'cause exactly what you're saying is like,
01:39:52.820 | the ideas of concepts and be able to reason
01:39:54.900 | with those concepts and connect those concepts
01:39:57.620 | in like hierarchical ways,
01:40:00.260 | and then to be able to have world models.
01:40:03.500 | I mean, just everything we're describing
01:40:05.580 | in human language in this poetic way seems to make sense,
01:40:09.460 | that that is what intelligence and reasoning are like.
01:40:12.140 | I wonder if at the core of it, it could be much dumber.
01:40:15.580 | - Well, finally it is still connections
01:40:18.500 | and messages passing over, right?
01:40:20.220 | - Right.
01:40:21.060 | - So in that way it's dumb.
01:40:22.220 | (laughing)
01:40:23.780 | - So I guess the recursion, the feedback mechanism,
01:40:27.660 | that does seem to be a fundamental kind of thing.
01:40:30.200 | Yeah, yeah.
01:40:32.820 | The idea of concepts, also memory.
01:40:36.140 | - Correct.
01:40:36.960 | Yeah, having an episodic memory.
01:40:38.620 | - Yeah.
01:40:39.460 | That seems to be an important thing.
01:40:41.460 | - So how do we get memory?
01:40:43.140 | - So yeah, we have another piece of work
01:40:45.180 | which came out recently on how do you form episodic memories
01:40:49.820 | and form abstractions from them?
01:40:52.260 | And we haven't figured out all the connections of that
01:40:55.980 | to the overall cognitive architecture, but-
01:40:58.340 | - Well, yeah, what are your ideas
01:41:00.340 | about how you could have episodic memory?
01:41:03.100 | - So at least it's very clear that you need
01:41:06.260 | to have two kinds of memory, right?
01:41:07.660 | That's very, very clear, right?
01:41:09.660 | There are things that happen
01:41:12.340 | as statistical patterns in the world,
01:41:16.260 | but then there is the one timeline of things
01:41:19.580 | that happen only once in your life, right?
01:41:22.100 | And this day is not going to happen ever again.
01:41:24.620 | And that needs to be stored as just a stream of strings,
01:41:29.620 | right?
01:41:32.180 | This is my experience.
01:41:33.140 | And then the question is about how do you take
01:41:37.420 | that experience and connect it to the statistical part
01:41:39.820 | of it?
01:41:40.660 | How do you now say that, okay, I experienced this thing.
01:41:43.540 | Now I want to be careful about similar situations.
01:41:47.460 | And so you need to be able to index that similarity
01:41:53.140 | using your other giant statistics,
01:41:56.420 | the model of the world that you have learned.
01:41:59.180 | Although the situation came from the episode,
01:42:01.420 | you need to be able to index the other one.
01:42:03.340 | So the episodic memory being implemented
01:42:08.340 | as an indexing over the other model that you're building.
01:42:14.220 | - So the memories remain and they're an index
01:42:20.260 | into this, like the statistical thing that you formed.
01:42:24.820 | - Yeah, statistical causal structural model
01:42:26.780 | that you've built over time.
01:42:28.900 | So it's basically the idea is that the hippocampus
01:42:32.940 | is just storing or sequencing in a set of pointers
01:42:37.940 | that happens over time.
01:42:42.380 | And then whenever you want to reconstitute that memory
01:42:45.060 | and evaluate the different aspects of it,
01:42:49.180 | whether it was good, bad,
01:42:50.420 | do I need to encounter the situation again?
01:42:52.620 | You need the cortex to re-instantiate,
01:42:56.540 | to replay that memory.
01:42:57.660 | - So how do you find that memory?
01:43:00.500 | Which direction is the important direction?
01:43:03.300 | - Both directions are again bidirectional.
01:43:06.060 | - I guess, how do you retrieve the memory?
01:43:09.860 | - So this is again hypothesis, right?
01:43:11.580 | We're making this up.
01:43:12.420 | So when you come to a new situation,
01:43:15.900 | your cortex is doing inference over in the new situation.
01:43:21.340 | And then of course, hippocampus is connected
01:43:23.940 | to different parts of the cortex.
01:43:25.980 | And you have this deja vu situation, right?
01:43:29.700 | Okay, I have seen this thing before.
01:43:32.060 | And then in the hippocampus,
01:43:35.420 | you can have an index of,
01:43:36.940 | okay, this is when it happened as a timeline.
01:43:40.020 | Then you can use the hippocampus
01:43:44.540 | to drive the similar timelines to say,
01:43:47.780 | now I am rather than being driven
01:43:50.460 | by my current input stimuli,
01:43:53.060 | I am going back in time
01:43:54.620 | and rewinding my experience from there.
01:43:56.820 | - Replaying it.
01:43:57.660 | - But putting back into the cortex.
01:43:58.940 | And then putting it back into the cortex,
01:44:00.980 | of course affects what you're going to see next
01:44:03.820 | in your current situation.
01:44:05.100 | - Got it, yeah.
01:44:05.980 | So that's the whole thing,
01:44:07.540 | having a world model and then, yeah.
01:44:10.340 | Connecting to the perception.
01:44:11.820 | Yeah, it does seem to be that that's what's happening.
01:44:14.420 | It'd be, on the neural network side,
01:44:17.660 | it's interesting to think of how we actually do that.
01:44:21.700 | - Yeah, yeah.
01:44:23.380 | - To have a knowledge base.
01:44:24.900 | - Yes, it is possible that you can put many
01:44:27.340 | of these structures into neural networks
01:44:30.420 | and we will find ways of combining properties
01:44:34.740 | of neural networks and graphical models.
01:44:38.420 | So, I mean, it's already started happening.
01:44:41.580 | Graph neural networks are kind of a merge between them.
01:44:44.620 | And there will be more of that.
01:44:46.660 | So, but to me, the direction is pretty clear.
01:44:50.740 | I mean, looking at biology and the history
01:44:53.660 | of evolutionary history of intelligence,
01:44:57.820 | it is pretty clear that, okay,
01:45:00.420 | what we need is more structure in the models
01:45:03.580 | and modeling of the world
01:45:05.660 | and supporting dynamic inference.
01:45:08.100 | - Well, let me ask you, there's a guy named Elon Musk.
01:45:12.820 | There's a company called Neuralink
01:45:14.540 | and there's a general field called
01:45:15.980 | Brain-Computer Interfaces.
01:45:18.300 | It's kind of an interface between your two loves.
01:45:23.140 | - Yes.
01:45:23.980 | - The brain and the intelligence.
01:45:25.700 | So, there's like very direct applications
01:45:28.220 | of Brain-Computer Interfaces for people
01:45:31.300 | with different conditions, more in the short term.
01:45:33.620 | But there's also these sci-fi futuristic kinds of ideas
01:45:36.620 | of AI systems being able to communicate
01:45:41.620 | in a high bandwidth way with the brain, bi-directional.
01:45:46.540 | What are your thoughts about Neuralink
01:45:49.300 | and BCI in general as a possibility?
01:45:52.500 | - So, I think BCI is a cool research area.
01:45:56.540 | And in fact, when I got interested in brains initially,
01:46:01.540 | when I was enrolled at Stanford
01:46:03.260 | and when I got interested in brains,
01:46:04.700 | it was through a brain-computer interface talk
01:46:09.060 | that Krishna Shenoy gave.
01:46:10.540 | That's when I even started thinking about the problem.
01:46:13.140 | So, it is definitely a fascinating research area
01:46:16.500 | and the applications are enormous, right?
01:46:19.740 | So, there's the science fiction scenario
01:46:22.380 | of brains directly communicating.
01:46:24.460 | Let's keep that aside for the time being.
01:46:27.700 | Even just the intermediate milestones they're pursuing,
01:46:30.940 | which are very reasonable as far as I can see,
01:46:33.980 | being able to control an external limb
01:46:36.420 | using direct connections from the brain
01:46:40.700 | and being able to write things into the brain.
01:46:43.740 | So, those are all good steps to take
01:46:47.940 | and they have enormous applications.
01:46:50.620 | People losing limbs being able to control prosthetics,
01:46:54.620 | quadriplegics being able to control something,
01:46:57.340 | so, and therapeutics.
01:46:59.180 | And I also know about another company
01:47:01.220 | working in this space called Paradromics.
01:47:03.820 | They're based on a different electrode array,
01:47:08.260 | but trying to attack some of the same problems.
01:47:10.260 | So, I think it's a very-
01:47:11.980 | - Also surgery?
01:47:13.780 | - Correct, surgically implanted electrodes, yeah.
01:47:16.580 | So, yeah, I think of it as a very, very promising field,
01:47:21.460 | especially when it is helping people
01:47:23.780 | overcome some limitations.
01:47:25.580 | Now, at some point, of course,
01:47:27.660 | it will advance the level of being able to communicate.
01:47:31.460 | - How hard is that problem, do you think?
01:47:33.380 | Like, so, okay, let's say we magically solve
01:47:37.580 | what I think is a really hard problem
01:47:39.980 | of doing all of this safely.
01:47:42.260 | - Yeah.
01:47:43.300 | - So, like, being able to connect electrodes,
01:47:46.940 | and not just thousands, but like millions to the brain.
01:47:50.660 | - I think it's very, very hard
01:47:52.100 | because you also do not know
01:47:54.620 | what will happen to the brain with that, right?
01:47:57.660 | In the sense of how does the brain adapt
01:47:59.260 | to something like that?
01:48:00.100 | - And it's, you know, as we were learning,
01:48:02.220 | the brain is quite, in terms of neuroplasticity,
01:48:06.220 | is pretty malleable.
01:48:07.460 | - Correct.
01:48:08.300 | - So, it's gonna adjust.
01:48:09.260 | - Correct.
01:48:10.300 | - So, the machine learning side,
01:48:11.620 | the computer side is gonna adjust,
01:48:13.260 | and then the brain's gonna adjust.
01:48:14.660 | - Exactly, and then what soup does this land us into?
01:48:17.460 | - The kind of hallucinations you might get from this
01:48:21.900 | that might be pretty intense.
01:48:23.180 | - Yeah, yeah.
01:48:24.620 | - Just connecting to all of Wikipedia.
01:48:27.100 | It's interesting whether we need to be able to figure out
01:48:30.780 | the basic protocol of the brain's communication schemes
01:48:35.140 | in order to get them to, the machine and the brain to talk.
01:48:38.940 | 'Cause another possibility is the brain
01:48:41.300 | actually just adjusts to whatever the heck
01:48:42.980 | the computer is doing.
01:48:43.820 | - Exactly, that's the way I think,
01:48:45.740 | I find that to be a more promising way.
01:48:48.140 | It's basically saying, you know,
01:48:50.380 | okay, attach electrodes to some part of the cortex, okay?
01:48:53.340 | And maybe if it is done from birth,
01:48:56.980 | the brain will adapt it such that, you know,
01:48:59.060 | that part is not damaged, it was not used for anything.
01:49:01.580 | These electrodes are attached there, right?
01:49:03.020 | And now, you train that part of the brain
01:49:06.780 | to do this high bandwidth communication
01:49:08.580 | between something else, right?
01:49:10.540 | And if you do it like that, then it is brain adapting to,
01:49:15.260 | and of course, your external system is designed
01:49:17.340 | such that it is adaptable.
01:49:18.580 | Just like we design computers or mouse, keyboard,
01:49:22.420 | all of them to be interacting with humans.
01:49:26.860 | So of course, that feedback system is designed
01:49:29.540 | to be human compatible, but now it is not trying to record
01:49:34.540 | from all of the brain and now, you know,
01:49:39.780 | two system trying to adapt to each other.
01:49:41.660 | It's the brain adapting into one way.
01:49:44.340 | - That's fascinating.
01:49:45.660 | The brain is connected to like the internet.
01:49:48.140 | It's connected.
01:49:49.140 | - Yeah.
01:49:50.340 | - Just imagine, it's connecting it to Twitter
01:49:52.260 | and just taking that stream of information.
01:49:56.500 | Yeah, but again, if we take a step back,
01:50:00.620 | I don't know what your intuition is.
01:50:02.460 | I feel like that is not as hard of a problem
01:50:06.700 | as doing it safely.
01:50:10.140 | There's a huge barrier to surgery.
01:50:14.980 | - Right.
01:50:15.820 | - 'Cause the biological system,
01:50:17.620 | it's a mush of like weird stuff.
01:50:20.900 | - Correct.
01:50:21.740 | So that, the surgery part of it, biology part of it,
01:50:25.100 | the long-term repercussions part of it,
01:50:27.820 | again, I don't know what else will,
01:50:29.620 | we often find after a long time in biology that,
01:50:35.580 | okay, that idea was wrong, right?
01:50:37.620 | So people used to cut off this,
01:50:40.260 | the gland called the thymus or something.
01:50:43.780 | And then they found that,
01:50:45.820 | oh no, that actually causes cancer.
01:50:48.300 | (both laughing)
01:50:50.660 | - And then there's a subtle,
01:50:51.980 | like millions of variables involved.
01:50:54.140 | But this whole process, the nice thing,
01:50:56.660 | just like, again, with Elon, just like colonizing Mars,
01:51:00.980 | seems like a ridiculously difficult idea.
01:51:03.500 | But in the process of doing it,
01:51:05.500 | we might learn a lot about the biology,
01:51:08.420 | the neurobiology of the brain,
01:51:10.380 | the neuroscience side of things.
01:51:11.900 | It's like, if you wanna learn something,
01:51:14.380 | do the most difficult version of it.
01:51:16.780 | - Yeah.
01:51:17.620 | - And see what you learn.
01:51:18.540 | - The intermediate steps that they are taking
01:51:20.780 | sounded all very reasonable to me.
01:51:22.500 | - Yeah, it's great.
01:51:24.180 | Well, but like everything with Elon
01:51:26.260 | is the timeline seems insanely fast, so.
01:51:28.940 | - Right.
01:51:29.780 | - That's the only awful question.
01:51:33.220 | - Well, we've been talking about cognition a little bit,
01:51:36.220 | so like reasoning.
01:51:37.380 | We haven't mentioned the other C word,
01:51:40.460 | which is consciousness.
01:51:42.380 | Do you ever think about that one?
01:51:43.940 | Is that useful at all in this whole context
01:51:47.340 | of what it takes to create an intelligent reasoning being?
01:51:52.340 | Or is that completely outside of your,
01:51:55.780 | like the engineering perspective of intelligence?
01:51:58.540 | - It is not outside the realm,
01:52:00.180 | but it doesn't on a day-to-day basis inform what we do,
01:52:05.180 | but it's more, so in many ways,
01:52:08.180 | the company name is connected to this idea of consciousness.
01:52:12.820 | - What's the company name?
01:52:13.860 | - Vicarious, so Vicarious is the company name.
01:52:16.620 | So what does Vicarious mean?
01:52:20.180 | At the first level, it is about modeling the world,
01:52:25.100 | and it is internalizing the external actions.
01:52:29.460 | So you interact with the world
01:52:31.500 | and learn a lot about the world.
01:52:33.060 | And now, after having learned a lot about the world,
01:52:36.500 | you can run those things in your mind
01:52:40.180 | without actually having to act in the world.
01:52:42.820 | So you can run things vicariously, just in your brain.
01:52:47.180 | And similarly, you can experience another person's thoughts
01:52:51.460 | by having a model of how that person works
01:52:54.660 | and running there, putting yourself
01:52:58.020 | in some other person's shoes.
01:52:59.740 | So that is being vicarious.
01:53:01.380 | Now, it's the same modeling apparatus
01:53:04.580 | that you're using to model the external world
01:53:06.940 | or some other person's thoughts.
01:53:08.940 | You can turn it to yourself.
01:53:11.260 | You can, if that same modeling thing
01:53:13.900 | is applied to your own modeling apparatus,
01:53:17.620 | then that is what gives rise to consciousness, I think.
01:53:21.140 | - Well, that's more like self-awareness.
01:53:23.660 | There's the hard problem of consciousness,
01:53:25.540 | which is when the model feels like something,
01:53:30.540 | when this whole process is like,
01:53:34.760 | it's like you really are in it.
01:53:39.180 | You feel like an entity in this world.
01:53:41.860 | Not just you know that you're an entity,
01:53:44.100 | but it feels like something to be that entity.
01:53:50.580 | And thereby, we attribute this,
01:53:53.220 | then it starts to be where in something
01:53:56.260 | that has consciousness can suffer.
01:53:58.500 | You start to have these kinds of things
01:54:00.180 | that we can reason about that.
01:54:02.340 | - Yes.
01:54:03.180 | - It's much heavier.
01:54:06.460 | It seems like there's much greater cost of your decisions.
01:54:11.020 | And mortality is tied up into that.
01:54:14.540 | The fact that these things end.
01:54:16.940 | - Right.
01:54:18.260 | - First of all, I end at some point,
01:54:20.340 | and then other things end.
01:54:22.460 | And that somehow seems to be,
01:54:25.940 | at least for us humans, a deep motivator.
01:54:30.260 | - Yes.
01:54:31.380 | - And that idea of motivation in general,
01:54:34.300 | we talk about goals in AI,
01:54:35.820 | but goals aren't quite the same thing as our mortality.
01:54:40.820 | It feels like first of all, humans don't have a goal.
01:54:47.540 | And they just kind of create goals at different levels.
01:54:50.540 | They like make up goals,
01:54:52.980 | because we're terrified by the mystery
01:54:56.180 | of the thing that gets us all.
01:54:59.020 | So we make these goals up.
01:55:02.020 | So we're like a goal generation machine,
01:55:05.180 | as opposed to a machine which optimizes
01:55:07.980 | the trajectory towards a singular goal.
01:55:10.860 | So it feels like that's an important part of cognition,
01:55:15.700 | that whole mortality thing.
01:55:17.500 | - Well, it is a part of human cognition,
01:55:21.500 | but there is no reason for that mortality
01:55:26.500 | to come to the equation for an artificial system,
01:55:31.500 | because we can copy the artificial system.
01:55:35.540 | The problem with humans is that I can't clone you.
01:55:38.300 | Even if I clone you as the hardware,
01:55:44.220 | your experience that was stored in your brain,
01:55:47.380 | or your episodic memory,
01:55:49.660 | all those will not be captured in the new clone.
01:55:52.060 | But that's not the same with an AI system.
01:55:56.100 | - But it's also possible that the thing
01:56:01.020 | that you mentioned with us humans
01:56:03.820 | is actually of fundamental importance for intelligence.
01:56:06.940 | So the fact that you can copy an AI system
01:56:10.380 | means that that AI system is not yet an AGI.
01:56:16.140 | - So if you look at existence proof,
01:56:18.780 | if we reason based on existence proof,
01:56:21.860 | you could say that it doesn't feel like death
01:56:23.980 | is a fundamental property of an intelligent system,
01:56:27.300 | but we don't yet, give me an example
01:56:31.900 | of an immortal intelligent being, we don't have those.
01:56:36.500 | It's very possible that that is a fundamental property
01:56:42.540 | of intelligence is a thing that has a deadline for itself.
01:56:47.540 | (both laughing)
01:56:48.420 | - You can think of it like this.
01:56:49.860 | Suppose you invent a way to freeze people for a long time.
01:56:54.500 | It's not dying, right?
01:56:56.460 | So you can be frozen and woken up
01:56:59.780 | thousands of years from now.
01:57:01.940 | So it's no fear of death.
01:57:03.300 | - Well, no, you're still, it's not about time.
01:57:08.980 | It's about the knowledge that it's temporary.
01:57:12.560 | And that aspect of it, the finiteness of it,
01:57:17.560 | I think creates a kind of urgency.
01:57:21.680 | - Correct, for us, for humans.
01:57:23.640 | - Yeah, for humans.
01:57:24.480 | - Yes, and that is part of our drives.
01:57:28.040 | But, and that's why I'm not too worried about AI,
01:57:32.560 | having motivations to kill all humans
01:57:38.040 | and those kinds of things.
01:57:39.400 | Why, just wait.
01:57:40.800 | (both laughing)
01:57:43.460 | Why do you need to do that?
01:57:45.240 | - Yeah, I've never heard that before.
01:57:46.640 | That's a good point.
01:57:48.480 | Yeah, just murder seems like a lot of work.
01:57:53.040 | Just wait it out.
01:57:54.260 | They'll probably hurt themselves.
01:57:57.360 | Let me ask you, people often kind of wonder,
01:58:01.520 | world-class researchers such as yourself,
01:58:05.520 | what kind of books, technical, fiction, philosophical,
01:58:09.920 | were, had an impact on you in your life
01:58:14.040 | and maybe ones you could possibly recommend
01:58:19.040 | that others read, maybe if you have three books
01:58:21.600 | that pop into mind.
01:58:23.280 | - Yeah, so I definitely liked Judea Pearl's book,
01:58:27.280 | Probabilistic Reasoning and Intelligent Systems.
01:58:29.840 | It's a very deep technical book,
01:58:32.840 | but what I liked is that,
01:58:34.680 | so there are many places where you can learn
01:58:37.080 | about probabilistic graphical models from.
01:58:39.720 | But throughout this book, Judea Pearl kind of sprinkles
01:58:43.160 | his philosophical observations and he thinks about,
01:58:46.880 | connects us to how the brain thinks and attentions
01:58:50.040 | and resources, all those things.
01:58:51.520 | So that whole thing makes it more interesting to read.
01:58:55.280 | - He emphasizes the importance of causality.
01:58:57.960 | - So that was in his later book.
01:58:59.520 | So this was the first book,
01:59:00.840 | Probabilistic Reasoning and Intelligent Systems.
01:59:02.560 | He mentions causality, but he hadn't really
01:59:05.760 | sunk his teeth into, like, you know,
01:59:07.760 | how do you actually formalize--
01:59:09.200 | - Got it.
01:59:10.040 | - Yeah, and the second book, Causality,
01:59:12.520 | it was the one in 2000, that one is really hard.
01:59:16.240 | So I wouldn't recommend that.
01:59:17.840 | - Oh yeah, so that looks at the, like,
01:59:19.320 | the mathematical, like, his model of--
01:59:22.680 | - Due calculus.
01:59:23.520 | - Due calculus, yeah, it was pretty dense mathematically.
01:59:25.680 | - Right, right, right.
01:59:27.120 | The book of Y is definitely more enjoyable.
01:59:29.040 | - Oh, for sure.
01:59:29.880 | - Yeah, so yeah, so I would recommend
01:59:32.000 | Probabilistic Reasoning and Intelligent Systems.
01:59:34.400 | Another book I liked was one from Doug Hofstadter.
01:59:39.320 | This was a long time ago.
01:59:40.160 | He has a book, he had a book, I think,
01:59:41.800 | called, it was called The Mind's Eye.
01:59:43.600 | It was probably Hofstadter and Daniel Dennett together.
01:59:48.600 | - Yeah, and I actually was, I bought that book.
01:59:53.360 | It's on my shelf, I haven't read it yet.
01:59:55.040 | But I couldn't get an electronic version of it,
01:59:58.440 | which is annoying, 'cause you read everything on Kindle.
02:00:01.840 | - Oh, okay.
02:00:02.680 | - So you have to actually purchase the physical.
02:00:05.080 | It's like one of the only physical books I have,
02:00:07.200 | 'cause anyway, a lot of people recommended it highly, so.
02:00:10.880 | - Yeah, and the third one I would definitely
02:00:13.800 | recommend reading is, this is not a technical book.
02:00:17.800 | It is history.
02:00:19.520 | It's called, the name of the book, I think,
02:00:22.320 | is Bishop's Boys.
02:00:23.480 | It's about Wright brothers and their path
02:00:28.520 | and how it was, there are multiple books on this topic
02:00:33.520 | and all of them are great.
02:00:35.640 | It's fascinating how flight was treated
02:00:40.640 | as an unsolvable problem.
02:00:44.680 | And also, what aspects did people emphasize?
02:00:49.040 | People thought, oh, it is all about just powerful engines.
02:00:54.480 | Just need to have powerful, lightweight engines.
02:00:58.000 | And so, some people thought of it as,
02:01:01.960 | how far can we just throw the thing?
02:01:04.520 | Just throw it.
02:01:05.360 | - Like a catapult.
02:01:06.360 | - Yeah, so it's a very fascinating,
02:01:09.600 | and even after they made the invention,
02:01:13.200 | people not believing it.
02:01:14.400 | - The social aspect of it, yeah.
02:01:17.040 | - The social aspect, it's very fascinating.
02:01:20.160 | - Do you draw any parallels between birds fly?
02:01:25.400 | So there's the natural approach to flight
02:01:28.440 | and then there's the engineered approach.
02:01:30.600 | Do you see the same kind of thing with the brain
02:01:34.080 | and our trying to engineer intelligence?
02:01:36.440 | - Yeah, it's a good analogy to have.
02:01:40.600 | Of course, all analogies have their--
02:01:43.720 | - Limits, for sure.
02:01:46.040 | - So people in AI often use airplanes as an example of,
02:01:51.040 | hey, we didn't learn anything from birds.
02:01:54.200 | Look, but the funny thing is that,
02:01:57.520 | and the saying is, airplanes don't flap wings.
02:02:01.760 | This is what they say.
02:02:03.280 | The funny thing and the ironic thing is that,
02:02:05.680 | that you don't need to flap to fly
02:02:09.000 | is something Wright Brothers found by observing birds.
02:02:12.240 | (both laughing)
02:02:14.920 | In their notebook, in some of these books,
02:02:18.800 | they show their notebook drawings,
02:02:20.840 | they make detailed notes about buzzards
02:02:24.120 | just soaring over thermals.
02:02:27.040 | And they basically say, look,
02:02:28.800 | flapping is not the important,
02:02:30.600 | propulsion is not the important problem to solve here.
02:02:33.320 | We want to solve control.
02:02:35.520 | And once you solve control,
02:02:37.320 | propulsion will fall into place.
02:02:38.880 | All of these are people,
02:02:40.840 | they realize this by observing birds.
02:02:43.040 | (both laughing)
02:02:44.560 | - Beautifully put.
02:02:46.120 | That's actually brilliant.
02:02:47.600 | 'Cause people do use that analogy a lot.
02:02:49.360 | I'm gonna have to remember that one.
02:02:51.400 | Do you have advice for people
02:02:53.640 | interested in artificial intelligence,
02:02:55.160 | like young folks today?
02:02:56.240 | I talk to undergraduate students all the time.
02:02:59.360 | Interested in neuroscience,
02:03:00.560 | interested in understanding how the brain works.
02:03:03.720 | Is there advice you would give them about their career,
02:03:06.880 | maybe about their life in general?
02:03:09.600 | - Sure.
02:03:10.440 | I think every piece of advice
02:03:12.520 | should be taken with a pinch of salt, of course.
02:03:14.880 | Because each person is different,
02:03:17.240 | their motivations are different.
02:03:18.520 | But I can definitely say,
02:03:21.360 | if your goal is to understand the brain
02:03:24.520 | from the angle of wanting to build one,
02:03:26.880 | then being an experimental neuroscientist
02:03:32.760 | might not be the way to go about it.
02:03:34.600 | A better way to pursue it might be through computer science,
02:03:41.960 | electrical engineering, machine learning, and AI.
02:03:44.360 | And of course, you have to study up the neuroscience,
02:03:46.360 | but that you can do on your own.
02:03:48.480 | If you are more attracted by finding something intriguing,
02:03:54.080 | discovering something intriguing about the brain,
02:03:56.600 | then of course it is better to be an experimentalist.
02:04:00.320 | So find that motivation, what are you intrigued by?
02:04:02.480 | And of course, find your strengths too.
02:04:04.320 | Some people are very good experimentalists,
02:04:07.680 | and they enjoy doing that.
02:04:10.120 | - And it's interesting to see which department,
02:04:13.640 | if you're picking in terms of your education path,
02:04:17.520 | whether to go with, at MIT it's brain and computer,
02:04:22.520 | no, it'd be CS.
02:04:26.960 | - Yeah.
02:04:27.920 | - Brain and cognitive sciences, yeah.
02:04:30.240 | Or the CS side of things.
02:04:32.960 | And actually, the brain folks, the neuroscience folks
02:04:36.400 | are more and more now embracing of learning TensorFlow,
02:04:41.600 | PyTorch, right?
02:04:43.080 | They see the power of trying to engineer ideas
02:04:48.080 | that they get from the brain into,
02:04:52.600 | and then explore how those could be used
02:04:54.560 | to create intelligent systems.
02:04:56.640 | So that might be the right department actually.
02:04:59.160 | - Yeah.
02:05:00.600 | So this was a question in one of the
02:05:03.240 | Redwood Neuroscience Institute workshops
02:05:05.920 | that Jeff Hawkins organized almost 10 years ago.
02:05:09.960 | This question was put to a panel, right?
02:05:11.600 | What should be the undergrad major
02:05:13.920 | you should take if you want to understand the brain?
02:05:16.200 | And the majority opinion in that one
02:05:19.040 | was electrical engineering.
02:05:21.040 | - Interesting.
02:05:23.680 | - Because, I mean, I'm a W undergrad,
02:05:25.840 | so I got lucky in that way.
02:05:27.240 | But I think it does have some of the right ingredients,
02:05:31.320 | because you learn about circuits.
02:05:33.400 | You learn about how you can construct circuits
02:05:36.960 | to approach, do functions.
02:05:40.200 | You learn about microprocessors.
02:05:42.640 | You learn information theory.
02:05:43.760 | You learn signal processing.
02:05:45.520 | You learn continuous math.
02:05:46.760 | So in that way, it's a good step
02:05:50.960 | to if you want to go to computer science or neuroscience,
02:05:54.000 | you could, it's a good step.
02:05:55.080 | - The downside, you're more likely to be forced to use MATLAB.
02:06:00.080 | (laughing)
02:06:05.000 | - One of the interesting things about,
02:06:07.920 | I mean, this is changing.
02:06:08.880 | The world is changing.
02:06:09.880 | But certain departments lagged
02:06:13.120 | on the programming side of things,
02:06:15.240 | on developing good habits in software engineering.
02:06:18.440 | But I think that's more and more changing.
02:06:20.840 | And students can take that into their own hands,
02:06:24.700 | like learn to program.
02:06:26.000 | I feel like everybody should learn to program,
02:06:30.680 | because it, like everyone in the sciences,
02:06:34.560 | 'cause it empowers, it puts the data at your fingertips.
02:06:37.840 | So you can organize it.
02:06:38.840 | You can find all kinds of things in the data.
02:06:41.440 | And then you can also, for the appropriate sciences,
02:06:44.440 | build systems that, like based on that.
02:06:48.000 | So like then engineer intelligence systems.
02:06:50.160 | We already talked about mortality.
02:06:53.160 | So we hit a ridiculous point.
02:06:57.480 | But let me ask you the,
02:07:04.520 | one of the things about intelligence is it's goal-driven.
02:07:09.520 | And you study the brain.
02:07:12.800 | So the question is like,
02:07:13.680 | what's the goal that the brain is operating under?
02:07:15.960 | What's the meaning of it all for us humans, in your view?
02:07:19.040 | What's the meaning of life?
02:07:21.200 | (laughing)
02:07:22.560 | - The meaning of life is whatever you construct out of it.
02:07:25.000 | It's completely open.
02:07:26.440 | - It's open?
02:07:27.280 | - Yeah.
02:07:28.120 | - So there's nothing, like you mentioned,
02:07:31.920 | you like constraints.
02:07:32.840 | So there's, what's, it's wide open.
02:07:36.800 | Is there some useful aspect that you think about
02:07:41.720 | in terms of like the openness of it
02:07:45.240 | and just the basic mechanisms of generating goals
02:07:48.240 | in studying cognition in the brain that you think about?
02:07:54.400 | Or is it just about, 'cause everything we've talked about,
02:07:57.080 | kind of the perception system,
02:07:58.400 | is to understand the environment.
02:07:59.880 | That's like to be able to like not die.
02:08:02.480 | - Correct, exactly.
02:08:04.120 | - Like not fall over and like be able to,
02:08:06.640 | you don't think we need to think about
02:08:11.240 | anything bigger than that?
02:08:13.200 | - Yeah, I think so.
02:08:14.040 | Because it's basically being able to understand
02:08:18.000 | the machinery of the world,
02:08:19.360 | such that you can pursue whatever goals you want, right?
02:08:23.080 | - So the machinery of the world is really ultimately
02:08:26.080 | what we should be striving to understand.
02:08:27.920 | The rest is just whatever the heck you wanna do,
02:08:31.960 | or whatever fun you--
02:08:32.800 | - World is culturally popular, you know?
02:08:34.640 | (laughing)
02:08:37.440 | - I think that's beautifully put.
02:08:41.120 | I don't think there's a better way to end it.
02:08:45.240 | I'm so honored that you would show up here
02:08:48.560 | and waste your time with me.
02:08:49.880 | It's been an awesome conversation.
02:08:51.160 | Thanks so much for talking today.
02:08:52.440 | - Oh, thank you so much.
02:08:53.520 | This was so much more fun than I expected.
02:08:56.320 | (laughing)
02:08:57.640 | Thank you.
02:08:58.480 | - Thanks for listening to this conversation
02:09:00.920 | with Delete George.
02:09:02.000 | And thank you to our sponsors,
02:09:04.080 | Babbel, Raycon Earbuds, and Masterclass.
02:09:07.760 | Please consider supporting this podcast
02:09:09.760 | by going to babbel.com and use code Lex,
02:09:13.240 | going to buyraycon.com/lex,
02:09:16.160 | and signing up at masterclass.com/lex.
02:09:19.760 | Click the links, get the discount.
02:09:21.560 | It really is the best way to support this podcast.
02:09:24.280 | If you enjoy this thing, subscribe on YouTube,
02:09:26.840 | review the Five Stars on Apple Podcast,
02:09:29.080 | support it on Patreon, or connect with me on Twitter,
02:09:32.360 | @LexFriedman, spelled, yes, without the E,
02:09:37.360 | just F-R-I-D-M-A-N.
02:09:40.040 | And now let me leave you with some words
02:09:42.160 | from Marcus Aurelius.
02:09:44.880 | You have power over your mind, not outside events.
02:09:49.180 | Realize this and you will find strength.
02:09:52.180 | Thank you for listening and hope to see you next time.
02:09:56.560 | (upbeat music)
02:09:59.140 | (upbeat music)
02:10:01.720 | [BLANK_AUDIO]